全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件
20708 6
2007-08-15

请教大家,在做二阶段最小二乘回归时,拟合优度R2出现负值。请问问题出在哪里?谢谢!

Dependent Variable: CORR
Method: Two-Stage Least Squares
Date: 08/15/07 Time: 10:06
Sample: 1 36
Included observations: 36
Instrument list: LOG(DIST) ECON

Variable Coefficient Std. Error t-Statistic Prob.

C 0.497462 0.419442 1.186009 0.2438
LWT 0.035191 0.100292 0.350885 0.7278

R-squared -0.003822 Mean dependent var 0.351513
Adjusted R-squared -0.033346 S.D. dependent var 0.318975
S.E. of regression 0.324250 Sum squared resid 3.574693
F-statistic 0.123120 Durbin-Watson stat 1.265808
Prob(F-statistic) 0.727840

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2007-8-17 13:35:00

比较正常

建议参考伍得里奇的计量经济学导引中的 工具变量和两阶段最小二乘估计那章,好像是15章,

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-8-22 09:07:00
我在处理时间序列方面的问题时也遇到过类似的现象,后来检查发现是数据有问题。后一次我同学在数据没有问题的情况下也遇到过这个问题,他假设含有趋势和截距项的情况下做平稳性检验,结果拟合优度为负。我觉得很可能是因为模型设定偏差造成的。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2010-6-13 20:36:49
还有没有具体的答案啊 我经常遇到这个问题
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2010-6-13 21:35:07
是的woodridge的书初级的中文的
p77页     通过原点的回归
p457页  iv
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-1-17 10:36:19
Short Answer:

In essence, the FAQ in question explains why observing a negative R-squared
(R2) after estimating the parameters of a model via Two-stage least squares
(2SLS) using -ivreg- does not necessarily indicate model misspecification.
John questions this result.  In the Long Answer below, I provide simulation
evidence that the FAQ is indeed correct.


Long Answer:

John begins with a nice summary of the issue and a summary of the FAQ.

> I'm interested in comments and advice on how to interpret the
> parameters of a two stage least squares (2SLS) model with a negative
> R2. The problem is discussed on the website of Stata at
> http://www.stata.com/support/faqs/stat/2sls.html

> To summarize for myself, 2SLS uses instrumental variables to model
> the effects of righthand side endogenous variables. These
> instruments are the values of the endogenous variables as predicted
> by the exogenous variables in the model. When these instruments are
> replaced by the endogenous variables themselves, the predicted
> values can in some cases be way off, so much so that the residual SS
> is greater than the total SS. This would mean that the model SS is
> negative and hence that the R2 of the model is negative. This can
> happen even though the model contains strong and significant
> effects.

> The faq referred to above states that a negative R2 need not be a
> problem and that parameters can be safely interpreted if they are
> significant with reasonably small standard errors: "What does it
> mean when RSS is greater than TSS? Does this mean our parameter
> estimates are no good? Not really.  You can easily develop
> simulations where the parameter estimates from two- stage are quite
> good while the MSS is negative. Remember why we estimate two-stage
> models. We are interested in the parameters of the structural
> equation the elasticity of demand, the marginal propensity to
> consume, etc. If our two-stage model produces estimates of these
> parameters with acceptable standard errors, we should be happy
> regardless of MSS or R2.  If we were strictly interested in
> projections of the dependent variable, then we should probably
> consider the reduced form of the model."

John then raises his doubts.  In particular he writes,

> My take would be that the model fits the data very poorly and that
> the estimats should be regarded with exreme suspicion. This is
> generally the advice for maximum likelihood models, only interpret
> parameters of a model that fits the data well. A negative R2 would
> mean that the model was mis- specified and should not be
> interpreted.  Comments? And could anything be inferred about the
> nature of the misspecification?


There are a number of ways of illustrating that the FAQ is correct.  Perhaps
the most accessible is via simulation.  I interpret the claim of the FAQ to
be that there are models in which in the distribution of 2SLS estimates of
the parameters will be well approximated by its theoretical distribution but
that the R2 computed from some samples will be negative.

I simulate data from the model

(1)        y = 1 + - .1*x + e1 + e2
(2)        x = w + z + c1 + .5*e1
(3)        z = 1.5*c1 + e3

where e1, e2, w, c1, are all independent normal random variables.  The c1
term in equations (2) and (3) provide the correlation between x and z.  The
e1 term in equations (1) and (2) is the source of the correlation between x
and the error term (e1 + e2) for y.  The coefficient of -.1 is the parameter
that we are trying to estimate.  We are going to estimate this parameter via
2SLS using -ivreg- with y as the dependent variable, x as the endogenous
variable and z as the instrument for x.  In other words, for each simulated
sample we construct y, x, and z using independent draws of the standard normal
variables e1, e2, w, and c1 and equations (1)-(3).  Then we use

. ivreg y (x = z)

to estimate the coefficient -.1 .  For each simulated sample we record the
following statistics.

        b1                 the estimate of the coefficient (-.1)
        p                 the p of the null hypothesis that b1 = -.1
        reject          is one if p<.05 and 0 otherwise
        r2                the computed R2 (missing if mss < 0)
        mss             the value of the model sum of squares
        rho_x1e         the correlation between x1 and e=e1+e2
        rho_x1z1        the correlation between x1 and z1
        fsf                the first stage F-statistic
        p_fsf                the p-value from the first stage F-statistic


Below my signature is the Stata code for drawing 2,000 simulations of this
model, estimating the coefficient -.1, computing the statistics of interest
and finally summarizing the results.  Each simulated sample contains 1,000
observations, so the results should not be attributed to a small sample
size.

Here is what I obtained when I used -summarize- to look at the results.


. sum

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          b1 |      2000   -.1025507     .054484   -.361239   .0578525
           p |      2000    .4951122    .2863575   .0000638   .9994608
      reject |      2000         .05    .2179995          0          1
          r2 |        48    .0053981    .0050849   .0001775   .0205909
         mss |      2000   -82.03539    49.51527  -317.1851   37.13932
-------------+--------------------------------------------------------
     rho_x1e |      2000    .2344962    .0302909   .1359878    .325926
    rho_x1z1 |      2000    .5544141    .0222491    .483774   .6284751
         fsf |      2000    445.7681    51.80968   304.9355   651.5349
       p_fsf |      2000    1.63e-34    2.24e-33          0   7.55e-32


The results for rho_x1e, rho_x1z1, fsf, p_fsf indicate that the correlations
between the endogenous variable and the error term and between the
endogenous variable and its instrument are reasonable and that there is no
weak instrument problem.  The results for b1, p and reject indicate that the
mean estimate of the coefficient on x is very close to its true value of -.1
and that there is no size distortion of the test that coefficient on x =
-.1.  In short, the distribution of the estimates, b1, is very well
approximated by its theoretical asymptotic distribution.  Together, these
results that imply that the 2SLS estimator is performing according the
theory in these simulations.

Now note that there are only 48 observations on r2.  This is because there
are 1,952 observations in which mss < 0.

. count if mss < 0
1952

Thus, the results illustrate that there is at least one model for which the
distribution of the 2SLS estimates of the parameters is very well
approximated by its asymptotic distribution, but that the R2 will be
negative in most of the individual samples.  To obtain more models that
produce the same qualitative results, simply change the coefficient -.1 by a
small amount.  As one would expect, increasing the coefficient -.1 reduces
the fraction of the of simulated samples that produce a negative R2.

I hope that this helps.

        --David
        ddrukker@stata.com
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群