Logistic Regression with Heteroskedasticity?

4023

收藏 2012-05-07

各位大侠，帮个忙啊！
一、在做logistic回归时，发现9个自变量中有3个自变量是有共线性的（相关系数大于0.7），怎么都消除不了，不想用逐步回归法，因为想9个变量都保留；不能用因子分析法，因为数据量只有371个，而且自变量本身也只有9个，提取公因子的话就提取了3个，贡献率只有70%多一点；岭回归和片最小二乘回归用spss太不好做了。我该怎么办，又不能视而不见。
二、在做logistic回归的时候，要不要进行异方差检验，我的数据是截面数据，所以没打算做自相关检验。但是异方差检验不晓得怎么做。用eviews的话直接没有white检验这个选项，是不是logistic回归不要做异方差检验，还是怎样啊。
三、本打算通过画图的方式，来看自变量和因变量之间的关系，但是那个图画出来太吓人了，完全看不出之间有任何的关系，我的自变量有的是定性的变量（0/1),有的是连续变量，有的类似于分段变量（有一部分是a，一部分是b，一部分是c，一部分是d，……），我截取一段数据如下，是不是我这样的数据有问题，还是如何。
真心望赐教，非常感谢！


0	136	555.3	0	0	0.2235	0.4933	0	0	0
0	131	784.71	0	1	0.2235	0.4933	0	0	0
0	135	909.3	0	1	0.2235	0.4933	0	0	0
0	146	584.3	0	0	0.2235	0.4933	0	0	0


0	131	1012.7	0	0	0.0797	0.5073	0	4	0
0	124	858.55	0	0	0.0797	0.5073	0	4	0
0	115	528.75	0	0	0.0797	0.5073	0	4	0

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

mhzhou

2012-11-5 22:11:31

我也遇到和楼主一样的问题，想知道做岭回归的时候，因变量是否可以为虚拟变量

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

尤尤西子

2014-3-22 16:21:15

总是没有人来回答问题。。。同纠结中

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-3-24 08:59:11

mhzhou 发表于 2012-11-5 22:11
我也遇到和楼主一样的问题，想知道做岭回归的时候，因变量是否可以为虚拟变量

Answer is No, for dummy is only for independent variable.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-3-24 09:03:13

Heteroskedasticity is a very different problem in models like -probit- and -logit-. Think of it this way: your dependent variable is a
probability. A probabiltiy embodies uncertainty, and that uncertainty comes from all variables we have not included in our model. In one sense this makes it very easy to deal with heteroskedasticity: We just define our dependent variable of interest to be the probability given the control variabels in our model. The results of your model give an accurate description of what you have found in your data. However, we often want to give parameters a counterfactual interpretation (e.g. "if the men suddenly became women, then the probabiltiy changes by x percentage points"). Such a counterfactual interpretation is only correct if we can assume that there is no heteroscedasticity. Several solutions have been proposed and I trust none of them: they are just
too sensitive. If you really want to do something about it, than I you'll really need to do some reading. Since these models are so
sensitive, you really need to know what you are doing. A good entry point for that literature is (Williams 2009). But my position is that that problem is basically unsolvable, so not worth worrying about.

Hope that helps,

Williams, R. 2009. Using heterogenous choice models to compare logit and probit coefficients across groups. Sociological Methods & Research 37: 531--559.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-3-24 09:08:43

You could approach this problem using probit models, and once you've figured out if there's an issue and how it should be handle, then you could do equivalent logistics for ease of interpretation if you didn't want to stick with probit - they are essentially the same model in many ways, but there are some options with probit that relate to your question.

I believe you could fit your model with something like xtgee or oglm to get a first model. Then you can fit a heteroskedastic probit (oglm or a similar command). Once you have both models, since the probit model is nested within the het prob model, you can then do an LR test of nested models to see if there is an improvement in fit when using the heteroskedastic model.

I've read a surprising amount of "ignore it" regarding heteroscedasticity and binary outcomes. That seems like a bad idea, particularly with a lot of corrections available. Various robust options are available in Stata commands that address some related issues and are explained well in the Stata documentation.

http://www3.nd.edu/~rwilliam/oglm/oglm_Stata.pdf - pretty in depth discussion and explains things using reference to a specific Stata command.
Allison, Paul. 1999. Comparing Logit and Probit Coefficients Across Groups. Sociological Methods and Research 28(2): 186-208.
Yatchew, Adonis and Zvi Griliches. Specification Error in Probit Models. 1985. The Review of Economics and Statistics 67(1):134-139.

Hope this helps.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

ReneeBK

2014-3-24 09:14:57

in Stata, What you could do is estimate a model with -hetprob- and -probit- and do a likelihood ratio test (-lrtest-). This is an test for
heteroscedasticity in probit regression, which is very close to logisitic regression, except you don't get the nice odds ratios.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-3-24 09:18:20

SHAZAM procedure for testing for heteroskedasticity in logit and probit models

=SET NOECHO
PROC TESTHET
* Logit and Probit Models - Test for heteroskedasticity
* Reference: R. Davidson and J.G. MacKinnon, "Convenient Specification
* Tests for Logit and Probit Models", Journal of Econometrics,
* Vol 25, 1984, pp. 241-262.
SET NODOECHO NOOUTPUT
GEN1 TYPE_="[MODEL]"
* Check that the model type is valid
FORMAT(' ERROR: Model must be either PROBIT or LOGIT')
IF ((TYPE_.NE." LOGIT").AND.(TYPE_.NE." PROBIT"))
  PRINT / FORMAT
IF ((TYPE_.NE." LOGIT").AND.(TYPE_.NE." PROBIT"))
  STOP

* Model estimation
[MODEL] [DEPVAR] [X] / INDEX=XBETA_ PREDICT=CDF_

IF (TYPE_.EQ." LOGIT")
GENR PDF_=(1+EXP(-XBETA_))/((1+EXP(-XBETA_))**2)
IF (TYPE_.EQ." PROBIT")
DISTRIB XBETA_ / TYPE=NORMAL PDF=PDF_

COPY [Z] Z_
MATRIX Z_=Z_
GEN1 DF_=$COLS
* Equation (26), p. 247.
GENR ONE_=1
COPY [X] ONE_ X_
DO #=1,DF_
MATRIX ZZ_=Z_(0,#)
GENR ZZ_=-XBETA_*ZZ_
MATRIX Z_(0,#)=ZZ_
ENDO
MATRIX X_ = X_ | Z_
* Equations (16) and (17) , p. 245.
GENR YAUX_=[DEPVAR]*SQRT((1-CDF_)/CDF_) + ([DEPVAR]-1)*SQRT(CDF_/(1-CDF_))
MATRIX R_=(PDF_/SQRT(CDF_*(1-CDF_)))*X_
* Artificial regression - Equation (18), p. 246.
OLS YAUX_ R_ / NOCONSTANT
* LM test statistic - explained sum of squares
GEN1 LM2=$ZSSR
* p-value
DISTRIB LM2 / TYPE=CHI DF=DF_
GEN1 pvalue_=1-$CDF
* Print results
PRINT MODEL / NONAME
FORMAT(' Test statistic for heteroskedasticity  LM2 ='/F15.5)
PRINT LM2 / NONAME FORMAT
FORMAT(' chi-square degrees of freedom'/5X,F5.0)
PRINT DF_ / NONAME FORMAT
FORMAT(' p-value'/5X,F10.5)
PRINT pvalue_ / NONAME FORMAT
DELETE / ALL_
SET DOECHO OUTPUT
PROCEND
SET ECHO
For detail, please read
http://shazam.econ.ubc.ca/intro/logit3.htm

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群