最近在学习陈强教授的《计量经济学及Stata应用》(第二版),因为某些原因需要用python来实现计量过程。在学到准差分法(CO估计法)时,没能在statsmodel中找到对应的函数,自己写了一段感觉不太对,请大神指点迷津。
教材中涉及的内容直接po出来,免得又去翻书:
变换原模型使转换后的扰动项变成球形扰动项。
常使用迭代法进行估计,具体步骤:
.prais consumption temp price income,corc
# 使用CO方法 需要多次迭代 import statsmodels.api as sm import pandas as pd from statsmodels.stats.diagnostic import acorr_breusch_godfrey # 读取原始数据 icecream = pd.read_stata('../2_Data/Data-2e/icecream.dta') X = icecream[['temp','price','income']] y = icecream['consumption'] X = sm.add_constant(X) # 初始OLS回归 model = sm.OLS(y,X) results = model.fit() # 计算误差项的各期滞后值:datafr ame格式 resid = pd.Datafr ame() for i in range(len(icecream)): lag_n = f'lag_{i}' resid[lag_n] = results.resid.shift(i) # Cochrane-Orcutt迭代过程 converged = False iterations = 0 max_iterations = 10 # 设置最大迭代次数 tolerance = 0.001 # 设置收敛容差 hat_rho = [0] results_list = [ ] while not converged and iterations < max_iterations: # 拟合AR(1)模型来估计自相关系数 # 残差数据处理 lag_X = 'lag_{}'.format(iterations+1) lag_y = 'lag_{}'.format(iterations) resid_ols = resid.copy()[[lag_y, lag_X]] resid_ols.dropna(inplace=True) # 进行AR(1)模型拟合 resid_X = resid_ols[lag_X] resid_y = resid_ols[lag_y] # X = sm.add_constant(X) # 不加入常数项效果更好 rho = sm.OLS(resid_y, resid_X).fit() hat_rho.append(rho.params.iloc[0]) # 使用Cochrane-Orcutt变换调整误差项 X_adj = X.copy() - X.copy().shift(1)*hat_rho[-1] y_adj = y.copy() - y.copy().shift(1)*hat_rho[-1] # 重新进行OLS回归,去除t0项 results_new = sm.OLS(y_adj[1:], sm.add_constant(X_adj[1:])).fit() # 检查是否收敛 if abs(hat_rho[-1]-hat_rho[-2]) < tolerance: converged = True else: results_list.append(results_new) results = results_new # 更新模型为新迭代的结果 iterations += 1 print(f"迭代 {iterations}: rho = {hat_rho[-1]:.6f}, Breusch-Godfrey统计量p值{acorr_breusch_godfrey(results,1)[1]:.6f}:") print('【Cochrane-Orcutt】迭代结果------------------------') print(f'迭代次数: {iterations}次,模型呈现{iterations-1}阶自相关。') print('--------------------------------------------【END】') print(results_list[-2].summary())
【错误】:CO估计法都是只滞后一期,上面代码实现的是不停后移滞后项,这肯定不对。
修改后的代码:
import statsmodels.api as sm import pandas as pd # 读取原始数据,进行初始OLS回归 icecream = pd.read_stata('../2_Data/Data-2e/icecream.dta') X = icecream[['temp','price','income']] y = icecream['consumption'] X = sm.add_constant(X) model = sm.OLS(y,X) results = model.fit() # Cochrane-Orcutt迭代过程 converged = False iterations = 0 max_iterations = 10 # 设置最大迭代次数 tolerance = 0.001 # 设置收敛容差 hat_rho = [0] results_list = [ ] # 初始化第一轮的残差滞后数据 resid = pd.Datafr ame() resid['lag_0'] = results.resid resid['lag_1'] = results.resid.shift(1) resid['mu'] = 0 # print(resid.head()) while not converged and iterations < max_iterations: # 拟合AR(1)模型来估计自相关系数 resid_ols = resid.copy() resid_ols.dropna(inplace=True) # 进行AR(1)模型拟合 resid_X = resid_ols['lag_1'] resid_y = resid_ols['lag_0'] rho = sm.OLS(resid_y, resid_X).fit() hat_rho.append(rho.params.iloc[0]) print(f"迭代 {iterations+1}: rho = {hat_rho[-1]:.6f}") # 使用Cochrane-Orcutt变换后再进行回归,得到新残差项 X_adj = X.copy() - X.copy().shift(1)*hat_rho[-1] y_adj = y.copy() - y.copy().shift(1)*hat_rho[-1] results_new = sm.OLS(y_adj[1:], sm.add_constant(X_adj[1:])).fit() # 检查是否收敛 if abs(hat_rho[-1]-hat_rho[-2]) < tolerance: converged = True else: resid['mu'] = results_new.resid resid.loc[1:,'lag_0']= hat_rho[-1] * resid.loc[1:,'lag_1'] + resid['mu'][1:] resid['lag_1'] = resid['lag_0'].shift(1) # print(resid.head(3)) # 更新模型为新迭代的结果 results = results_new results_list.append(results_new) iterations += 1 print('【Cochrane-Orcutt】迭代结果------------------------') print(results_list[-1].summary())
在else里面,对原模型8.11中的ϵt\epsilon_tϵt做了还原,因为如果按上文步骤的第2步,做出来结果不收敛。所以用8.14的残差μt\mu_tμt和ρ^\hat\rhoρ^进行了如下计算:ϵt=ρ^ϵt−1+μt\epsilon_t = \hat\rho\epsilon_{t-1}+\mu_tϵt=ρ^ϵt−1+μt 这里的ϵ1\epsilon_1ϵ1直接取的是初始OLS回归后的残差序列第1期的值。但还是算不出来教材里面的结果。 计算结果:
else
迭代 1: rho = 0.400633 迭代 2: rho = 0.396399 迭代 3: rho = 0.396524 【Cochrane-Orcutt】迭代结果------------------------ OLS Regression Results ============================================================================== Dep. Variable: consumption R-squared: 0.651 Model: OLS Adj. R-squared: 0.609 Method: Least Squares F-statistic: 15.54 Date: Tue, 23 Apr 2024 Prob (F-statistic): 6.52e-06 Time: 15:45:05 Log-Likelihood: 60.905 No. Observations: 29 AIC: -113.8 Df Residuals: 25 BIC: -108.3 Df Model: 3 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 0.1547 0.289 0.535 0.597 -0.441 0.750 temp 0.0036 0.001 6.445 0.000 0.002 0.005 price -0.8905 0.811 -1.098 0.282 -2.560 0.779 income 0.0032 0.002 2.094 0.047 5.36e-05 0.006 ============================================================================== Omnibus: 3.070 Durbin-Watson: 1.546 Prob(Omnibus): 0.216 Jarque-Bera (JB): 1.643 Skew: 0.418 Prob(JB): 0.440 Kurtosis: 3.813 Cond. No. 8.60e+03 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 8.6e+03. This might indicate that there are strong multicollinearity or other numerical problems.
如果愿意复现上述过程,数据可以在[陈强教授的计量经济学及Stata主页]下载《计量经济学及Stata应用》(第二版)对应的数据集,然后修改上述代码中的对应数据的路径即可。
请注明:姓名-公司-职位
以便审核进群资格,未注明则拒绝
xujingjun 发表于 2024-4-23 12:48
newfei188 发表于 2024-4-24 10:57
watalo 发表于 2024-4-22 12:32 准差分法/Cochrane-Orcutt估计法的python实现问题 1.背景 最近在学习陈强教授的《计量经济学及Stata应用》 ...