金禾论坛有不少这方面的讨论,或许有帮助。
作者:xiangfei 发表于2005-6-30 18:37:10 最后跟贴:它的临界值是不......">请教关于panel data的问题 [
1 2 3 ]
作者:haitao1977 发表于2005-7-27 13:12:07 最后跟贴:arlion,也谢谢你......">请教panel data模型的选择问题
谢谢,不过金禾我老是连不上,不知怎么搞的
请教panel data模型的选择问题 各位大大,我是第一次用stata做东西,现有几个问题向大家请教,盼望你们的回复! 1、如何在stata中判断panel data模型是选择变系数、变截距还是二者都不变的模型呢?很多参考书都提到可以利用斜方差分析构造F统计量来实现,请问在stata中能否直接实现,如果不行又如何得到变系数、变截距模型的残差平方和S1、S2呢? 2、如何在panel data模型中检验异方差和序列相关,如果二者存在,那么在模型估计中又如何处理呢。大大帮我看看下面这段话是如何处理第二个问题的,我不是太明白呀。 Notes: The model is estimated using random effects panel data techniques, with a serially dependent error term. The modified Durbin-Watson statistic for the income regression underlying the estimate of the income risk variable is 1.6487 and the first-order serial correlation coefficient is 0.2686. The model also includes interactions between the occupation, age, age2, age3. All the variables in the model has been transformed according to the Prais-Winsten transformation to correct for first-order serial correlation. 3、在panel data模型回归后如何输出残差项,我做这个主要是需要该数据。 |
===========================================================================
变系数的模型我没用过,因为我处理的多是N比较大的截面,如果让系数也变化没有太多的用处。仅对我清楚的几块发表一点看法。 1。比较固定效应模型(变截距)和混合最小二乘(Pooled model)采用F检验。方法为, xtreg varlist , fe 结果的最后一行会告诉你这个F值,具体的含义你需要看看课本。 2。你的第二个问题实际上涉及两个问题,一个是异方差,一个是序列相关。 对于前者,请参考 [主题讨论] 异方差 帖子的最后几个帖子。 对于后者,你所列出的那段话实际上是先采用广义差分法处理数据,然后再用xrgeg, re 进行估计。 这部分你需要看看一般的序列相关处理的方法,一般的课本中也都会有。 如果你非常重视序列相关,不用随机效应模型的话,xtgls命令就足够解决你说的这两个问题了。这个你也可以参看 [主题讨论] 异方差 帖子 下面,我把一些常用的命令操作的实例列出来,或许有帮助。 |
===============================================================
========================== =========Codes============= ========================== // chp8_panel data * -to reader-*: 具体使用时,你可能需要更改数据存储的路径 use "D:\stata8\ado\Examples\XTFiles\grunfeld.dta", clear *--A1--* tsset company year /*declare Panel-variable and Time-variable*/ gen Lag_invest = L.invest order company year invest Lag_invest /*从新排列命令窗口中变量的显示顺序*/ *--A2--* xtdes /* To describe the pattern of Panel Data */ xtsum /* To calculate the describative statistics*/ //xttab invest *--A3--* xtreg mvalue invest kstock,fe /*to estimate fixed-effect model*/ xtreg mvalue invest kstock,re /*to estimate random-effect model*/ xtgls mvalue invest kstock,panels(he) /*to estimate the model with Heteroskedastic variance with GLS*/ *--A4--* *-Test fixed effect-* * method1: calculate the F statistic by yourself // step1 : estimate Pooled model and store R2 reg mvalue invest kstock ereturn list /*to reader: please see the results after this command*/ local R2_r = e(r2) local K = e(df_m) // step2 : estimate Fixed-effect model and store R2 xtreg mvalue invest kstock , fe local R2_u = 1- e(rss)/e(tss) local nT = e(N) local n = e(N_g) // step3 : calculate the F statistics and P-value local F1 = (`R2_u' - `R2_r')/(`n' - 1) local F2 = (1 - `R2_u')/(`nT' - `n' - `K') local F = `F1'/`F2' local p = 1- F(`n' - 1,`nT' - `n' - `K',`F') #delimit ; dis in ye "The F test for all u_i=0 is : " %8.2f `F' _n in ye "The P-value is: " %6.4f `p' ; #delimit cr * method2 : using the statistic given by stata's xtreg,fe command, * which will give the same result as method1 xtreg mvalue invest kstock , fe *--A5--* // Hausman 检验 //step1 estimate fixed-effect model and store the results xtreg mvalue invest kstock , fe est store fe //step2 estimate random-effect model and store the results xtreg mvalue invest kstock , re est store re hausman fe *--A6--* // Testing for random-effect using B-P test xtreg mvalue invest kstock , re xttest0 |
==============================================================
======================================= ===============Results=================== ======================================= . // chp8_panel data . . * -to reader-*: 具体使用时,你可能需要更改数据存储的路径 . use "D:\stata8\ado\Examples\XTFiles\grunfeld.dta", clear . . *--A1--* . tsset company year /*declare Panel-variable and Time-variable*/ > panel variable: company, 1 to 10 time variable: year, 1935 to 1954 . . gen Lag_invest = L.invest (10 missing values generated) . order company year invest Lag_invest /*从新排列命令窗口中变量的显示顺序*/ . . *--A2--* . xtdes /* To describe the pattern of Panel Data */ company: 1, 2, ..., 10 n = 10 year: 1935, 1936, ..., 1954 T = 20 Delta(year) = 1; (1954-1935)+1 = 20 (company*year uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 20 20 20 20 20 20 20 Freq. Percent Cum. | Pattern ---------------------------+---------------------- 10 100.00 100.00 | 11111111111111111111 ---------------------------+---------------------- 10 100.00 | XXXXXXXXXXXXXXXXXXXX . xtsum /* To calculate the describative statistics*/ Variable | Mean Std. Dev. Min Max | Observations -----------------+--------------------------------------------+---------------- company overall | 5.5 2.879489 1 10 | N = 200 between | 3.02765 1 10 | n = 10 within | 0 5.5 5.5 | T = 20 | | year overall | 1944.5 5.780751 1935 1954 | N = 200 between | 0 1944.5 1944.5 | n = 10 within | 5.780751 1935 1954 | T = 20 | | invest overall | 145.9583 216.8753 .93 1486.7 | N = 200 between | 198.8242 3.0845 608.02 | n = 10 within | 106.1986 -204.3617 1024.638 | T = 20 | | Lag_in~t overall | 139.2307 198.0107 .93 1304.4 | N = 190 between | 187.4249 2.977368 561.7737 | n = 10 within | 86.17229 -164.8429 881.8571 | T = 19 | | mvalue overall | 1081.681 1314.47 58.12 6241.7 | N = 200 between | 1334.917 70.921 4333.845 | n = 10 within | 340.5421 -459.964 2989.536 | T = 20 | | kstock overall | 276.0172 301.1039 .8 2226.3 | N = 200 between | 200.9701 5.9415 648.435 | n = 10 within | 232.6603 -369.6179 1853.882 | T = 20 | | time overall | 10.5 5.780751 1 20 | N = 200 between | 0 10.5 10.5 | n = 10 within | 5.780751 1 20 | T = 20 . //xttab invest . . *--A3--* . xtreg mvalue invest kstock,fe /*to estimate fixed-effect model*/ Fixed-effects (within) regression Number of obs = 200 Group variable (i): company Number of groups = 10 R-sq: within = 0.4117 Obs per group: min = 20 between = 0.8078 avg = 20.0 overall = 0.7388 max = 20 F(2,188) = 65.78 corr(u_i, Xb) = 0.6955 Prob > F = 0.0000 ------------------------------------------------------------------------------ mvalue | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- invest | 2.856166 .3075147 9.29 0.000 2.249543 3.462789 kstock | -.5078673 .1403662 -3.62 0.000 -.7847625 -.2309721 _cons | 804.9802 32.43177 24.82 0.000 741.0033 868.9571 -------------+---------------------------------------------------------------- sigma_u | 905.81517 sigma_e | 268.73329 rho | .91910377 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(9, 188) = 113.76 Prob > F = 0.0000 . xtreg mvalue invest kstock,re /*to estimate random-effect model*/ Random-effects GLS regression Number of obs = 200 Group variable (i): company Number of groups = 10 R-sq: within = 0.4115 Obs per group: min = 20 between = 0.8043 avg = 20.0 overall = 0.7371 max = 20 Random effects u_i ~ Gaussian Wald chi2(2) = 149.94 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ mvalue | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- invest | 3.113429 .3076132 10.12 0.000 2.510519 3.71634 kstock | -.578422 .1424721 -4.06 0.000 -.8576622 -.2991819 _cons | 786.9048 182.1715 4.32 0.000 429.8553 1143.954 -------------+---------------------------------------------------------------- sigma_u | 546.52144 sigma_e | 268.73329 rho | .80529268 (fraction of variance due to u_i) ------------------------------------------------------------------------------ . xtgls mvalue invest kstock,panels(he) /*to estimate the model with Heteroskedastic variance > with GLS*/ Cross-sectional time-series FGLS regression Coefficients: generalized least squares Panels: heteroskedastic Correlation: no autocorrelation Estimated covariances = 10 Number of obs = 200 Estimated autocorrelations = 0 Number of groups = 10 Estimated coefficients = 3 Time periods = 20 Wald chi2(2) = 326.06 Log likelihood = -1445.752 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ mvalue | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- invest | 5.666927 .3139891 18.05 0.000 5.05152 6.282335 kstock | -.6849888 .1123445 -6.10 0.000 -.90518 -.4647977 _cons | 299.1641 29.96708 9.98 0.000 240.4297 357.8985 ------------------------------------------------------------------------------ . . *--A4--* . *-Test fixed effect-* . * method1: calculate the F statistic by yourself . // step1 : estimate Pooled model and store R2 . reg mvalue invest kstock Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 288.50 Model | 256323828 2 128161914 Prob > F = 0.0000 Residual | 87514460.3 197 444235.839 R-squared = 0.7455 -------------+------------------------------ Adj R-squared = 0.7429 Total | 343838288 199 1727830.6 Root MSE = 666.51 ------------------------------------------------------------------------------ mvalue | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- invest | 5.759807 .2908613 19.80 0.000 5.186206 6.333409 kstock | -.6152727 .2094979 -2.94 0.004 -1.028419 -.2021263 _cons | 410.8156 64.14189 6.40 0.000 284.3227 537.3084 ------------------------------------------------------------------------------ . ereturn list /*to reader: please see the results after this command*/ scalars: e(N) = 200 e(df_m) = 2 e(df_r) = 197 e(F) = 288.4997172285366 e(r2) = .7454778502025821 e(rmse) = 666.5101944725171 e(mss) = 256323828.0623155 e(rss) = 87514460.34915113 e(r2_a) = .7428938689863647 e(ll) = -1582.687429866819 e(ll_0) = -1719.524171284405 macros: e(depvar) : "mvalue" e(cmd) : "regress" e(predict) : "regres_p" e(model) : "ols" matrices: e(b) : 1 x 3 e(V) : 3 x 3 functions: e(sample) . local R2_r = e(r2) . local K = e(df_m) . . // step2 : estimate Fixed-effect model and store R2 . xtreg mvalue invest kstock , fe Fixed-effects (within) regression Number of obs = 200 Group variable (i): company Number of groups = 10 R-sq: within = 0.4117 Obs per group: min = 20 between = 0.8078 avg = 20.0 overall = 0.7388 max = 20 F(2,188) = 65.78 corr(u_i, Xb) = 0.6955 Prob > F = 0.0000 ------------------------------------------------------------------------------ mvalue | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- invest | 2.856166 .3075147 9.29 0.000 2.249543 3.462789 kstock | -.5078673 .1403662 -3.62 0.000 -.7847625 -.2309721 _cons | 804.9802 32.43177 24.82 0.000 741.0033 868.9571 -------------+---------------------------------------------------------------- sigma_u | 905.81517 sigma_e | 268.73329 rho | .91910377 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(9, 188) = 113.76 Prob > F = 0.0000 . local R2_u = 1- e(rss)/e(tss) . local nT = e(N) . local n = e(N_g) . . // step3 : calculate the F statistics and P-value . local F1 = (`R2_u' - `R2_r')/(`n' - 1) . local F2 = (1 - `R2_u')/(`nT' - `n' - `K') . local F = `F1'/`F2' . local p = 1- F(`n' - 1,`nT' - `n' - `K',`F') . #delimit ; delimiter now ; . dis in ye "The F test for all u_i=0 is : " %8.2f `F' _n > in ye "The P-value is: " %6.4f `p' ; The F test for all u_i=0 is : 113.76 The P-value is: 0.0000 . #delimit cr delimiter now cr . . * method2 : using the statistic given by stata's xtreg,fe command, . * which will give the same result as method1 . xtreg mvalue invest kstock , fe Fixed-effects (within) regression Number of obs = 200 Group variable (i): company Number of groups = 10 R-sq: within = 0.4117 Obs per group: min = 20 between = 0.8078 avg = 20.0 overall = 0.7388 max = 20 F(2,188) = 65.78 corr(u_i, Xb) = 0.6955 Prob > F = 0.0000 ------------------------------------------------------------------------------ mvalue | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- invest | 2.856166 .3075147 9.29 0.000 2.249543 3.462789 kstock | -.5078673 .1403662 -3.62 0.000 -.7847625 -.2309721 _cons | 804.9802 32.43177 24.82 0.000 741.0033 868.9571 -------------+---------------------------------------------------------------- sigma_u | 905.81517 sigma_e | 268.73329 rho | .91910377 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(9, 188) = 113.76 Prob > F = 0.0000 . . *--A5--* . // Hausman 检验 . //step1 estimate fixed-effect model and store the results . xtreg mvalue invest kstock , fe Fixed-effects (within) regression Number of obs = 200 Group variable (i): company Number of groups = 10 R-sq: within = 0.4117 Obs per group: min = 20 between = 0.8078 avg = 20.0 overall = 0.7388 max = 20 F(2,188) = 65.78 corr(u_i, Xb) = 0.6955 Prob > F = 0.0000 ------------------------------------------------------------------------------ mvalue | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- invest | 2.856166 .3075147 9.29 0.000 2.249543 3.462789 kstock | -.5078673 .1403662 -3.62 0.000 -.7847625 -.2309721 _cons | 804.9802 32.43177 24.82 0.000 741.0033 868.9571 -------------+---------------------------------------------------------------- sigma_u | 905.81517 sigma_e | 268.73329 rho | .91910377 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(9, 188) = 113.76 Prob > F = 0.0000 . est store fe . //step2 estimate random-effect model and store the results . xtreg mvalue invest kstock , re Random-effects GLS regression Number of obs = 200 Group variable (i): company Number of groups = 10 R-sq: within = 0.4115 Obs per group: min = 20 between = 0.8043 avg = 20.0 overall = 0.7371 max = 20 Random effects u_i ~ Gaussian Wald chi2(2) = 149.94 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ mvalue | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- invest | 3.113429 .3076132 10.12 0.000 2.510519 3.71634 kstock | -.578422 .1424721 -4.06 0.000 -.8576622 -.2991819 _cons | 786.9048 182.1715 4.32 0.000 429.8553 1143.954 -------------+---------------------------------------------------------------- sigma_u | 546.52144 sigma_e | 268.73329 rho | .80529268 (fraction of variance due to u_i) ------------------------------------------------------------------------------ . est store re . hausman fe ---- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fe re Difference S.E. -------------+---------------------------------------------------------------- invest | 2.856166 3.113429 -.2572636 . kstock | -.5078673 -.578422 .0705548 . ------------------------------------------------------------------------------ b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: H difference in coefficients not systematic chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 2366.62 Prob>chi2 = 0.0000 . . *--A6--* . // Testing for random-effect using B-P test . xtreg mvalue invest kstock , re Random-effects GLS regression Number of obs = 200 Group variable (i): company Number of groups = 10 R-sq: within = 0.4115 Obs per group: min = 20 between = 0.8043 avg = 20.0 overall = 0.7371 max = 20 Random effects u_i ~ Gaussian Wald chi2(2) = 149.94 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ mvalue | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- invest | 3.113429 .3076132 10.12 0.000 2.510519 3.71634 kstock | -.578422 .1424721 -4.06 0.000 -.8576622 -.2991819 _cons | 786.9048 182.1715 4.32 0.000 429.8553 1143.954 -------------+---------------------------------------------------------------- sigma_u | 546.52144 sigma_e | 268.73329 rho | .80529268 (fraction of variance due to u_i) ------------------------------------------------------------------------------ . xttest0 Breusch and Pagan Lagrangian multiplier test for random effects: mvalue[company,t] = Xb + u[company] + e[company,t] Estimated results: | Var sd = sqrt(Var) ---------+----------------------------- mvalue | 1727831 1314.47 e | 72217.58 268.7333 u | 298685.7 546.5214 Test: Var(u) = 0 chi2(1) = 772.32 Prob > chi2 = 0.0000 . end of do-file . |
=========================================================
xttest2 是用于在面板数据模型中检验截面间同期相关性的。因为xtreg,fe假设截面间是iid的,但是对于很多面板数据而言这个假设是有点强了。 同时,xttest1提供了一组用于面板分析的检验量,包括序列相关,随机效应等。 xttest0采用BP LM检验方法来检验随机效应。 至于面板数据的异方差可以通过构造F统计量来实现,主要是通过用 xtgls 命令分别估计同方差假设下的模型和异方差假设下的模型,然后基于残差平方和构造F统计量,自由度为N-1。 我上面的给出的是截面分析中的异方差检验,不是针对面板数据模型的。 |
==============================================================
面板异方差检验: 1) xtgls depvar varlist,igls panels(hetero) 2) est store reghet 3) xtgls depav varlist 4) est store regnohet 5) local df=e(N_g)-1 6) lrtest reghet,df(`df') 其中第五步与其它lrtest不同,书上这样解释: Normally, lrtest infers the number of constraints when we estimate nested models by looking at the number of parameters estimated. In the case of xtgls, however, the panel-level variances are estimated as nuisance parameters and their count is NOT included in the parameters estimated. So, we will need to tell lrtest how many constraints we have implied. 请Arlion解释一下“In the case of xtgls, however, the panel-level variances are estimated as nuisance parameters and their count is NOT included in the parameters estimated. ”这句话是啥意思。尤其是“nuisance parameters"的含义。 谢谢! |
============================================================
我对这个问题也不是非常清楚,下面是一些粗陋的理解。 “nuisance parameters"是指“未知参数”,也就是说我们在估计含有异方差的模型和不含异方差的模型时,二者并没有明确的参数差异。而一般情况下,我们在对约束进行LR检验时,往往是假设其中一个模型中的参数为零(或者某几个参数相等),这些参数都是可以明确指定的。 上面的例子中,我们只知道是对不同的截面设定了不同的方差,相对于同方差的设定而言,实际上是多了(N-1)个约束,所以LR检验的自由度为(N-1)。 就我所知,“nuisance parameters"常常与所谓的“Davies Problem”相关联,是指由于未知参数的存在使得检验统计量服从非标准分布的问题(Davies, 1977,1987)。这种情况下,统计量的临界值可以通过Bootstrap得到。 |
===============================================================
我想是这样的,那篇文章的流程在于首先检验面板中异方差的存在,继而采用FGLS加以估计,而且是考虑了固定效应的。那么我们可以采用如下流程处理: 1。去除固定效应。方法有两个,一个是加入N-1个虚拟变量,另一个是从每个截面中减掉组内平均,得到 Yit* = Yit - mean(Yit),以及Xit*。 2。采用xtgls , p(h)命令估计模型。 之所以如此处理,是因为xtgls命令处理的模型本身并没有考虑固定效应。(Greene,2000,第15章)。 3。如果要采用Wald检验,那么可以参考P598的公式。 不知大家意下如何? |
===========================================
我也为这个问题困扰,我说出我的理解,错了请Arlion纠正。 1、Hausman test的原假设是ui与其它解释变量X不相关(对应REM),见Greene(4e,p.577), 拒绝原假设表明ui与X相关,这时继续用REM模型,估计量是有偏的。因此FEM优于REM. 萧政的书上和约翰斯顿、迪纳尔多的书上都一再说:固定效应和随机效应的概念是不准确的。 固定效应并不是指ui是固定的。固定效应模型与随机效应模型的选择标准是ui是否与X相关。 2、异方差lrtest检验的原假设是同方差,见Greene(4e,p.511)。 3、用xtgls时还有不收敛的时候,我前几天看见过一个帖子,就是关于不收敛的问题, 但没仔细看。 4、predict yhat(无选择项默认拟合值) predict uhat,res(残差) 你不输入“,res” STATA总默认你要拟合值yhat。 |
=================================================
纠正道谈不上,我也在学习这些东西。 不过你提到 [萧政的书上和约翰斯顿、迪纳尔多的书上都一再说:固定效应和随机效应的概念是不准确的。]这句话,我觉得很重要。我想这个问题要归根到我们的抽样,如果我们的样本来自一个很大的母体,那么假设每一个截面都有各自不同的截距项(固定效应ui)是不合理的。相反,如果我们的样本来自一个小的母体,那么做这样的假设是有一定的道理的。 然而,不幸的是,很多时候,我们无法直观地区分我们的样本所对应的母体到底是一个大样本还是一个小样本,这种情况下固定效应模型和随机效应模型都可能适用。 如果我们已经确定我们的样本来自一个大的母体,比如我们做个人的消费行为(个人消费支出占收入的比重),那么采用随机效应模型因该是最合适的。若Hausman检验拒绝了原假设,就表明随机模型不适用。我觉得,我们可以采用工具变量法来重新设定模型,从而使我们可以继续在随机效应模型的框架下进行分析。但问题的关键是,如何选择工具变量?这个问题我还没有深入地去看,不过我想应该是一个方向。 另外,采用xtgls命令来处理异方差似乎是合适的,可是这种异方差的设定与我们所谓的随机效应是不是一回事?这个问题还需要我们作进一步的讨论。 以上是我的一些粗陋见解,不知大家怎么看。 |
以上内容转引自金禾论坛,由于内容太多,所以只能管中窥豹,望大家见谅。
扫码加好友,拉您进群



收藏
