全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SPSS论坛
22524 12
2008-02-03
现在做一个多元的线性模型,数据如何在得到参数之前进行这三个检验呢?
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2008-2-7 21:17:00

没有明确的检验,要画一些图和做另外的分析才能检验出来的!

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2008-2-13 09:33:00

Example


REGRESSION VAR=GRADE GPA STARTLEV TREATMNT
/REGWGT=WEIGHT
/DEP=GRADE
/METHOD=ENTER
/STATISTICS=COLLIN

COLLIN includes the varianceinflation factors (VIF) displayed in the Coefficients table, and the eigenvalues of the scaled and uncentered cross-products matrix, condition indexes, and Variance-decomposition proportions displayed in the Collinearity Diagnostics table.

[此贴子已经被作者于2008-2-13 9:34:38编辑过]

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2008-2-13 10:06:00

NOTES ON TEMPORAL AUTOCORRELATION AND REGRESSION

Multivariate Statistics, Dept of Geography, Hunter
  
College, Fall 2003, Prof. Allan Frei

Any sort of autocorrelation in the residuals can result in unrealistically narrow confidence limits and unrealistic significance estimates in a linear regression. This is because the regression equations assume that all cases are independent, which in the case of autocorrelation is not true. Thus, the number of degrees of freedom is overestimated, causing unrealistically small estimates of the standard errors of the regression coefficients. In addition, since autocorrelation can introduce slow changes (i.e. low frequency variability) in the time series, it can affect the estimate of the slope.

HOW TO DETERMINE IF TEMPORAL AUTOCORRELATION IS PRESENT IN YOUR RESIDUALS

1.                  EDA on Y, all the X variables, and the residuals should be performed whenever you do regression. Eyeballing the time series can give you an idea if autocorrelation might be a problem, and calculation of the ACF will also tell you if there might be a problem. Note that you should look for negative, as well as positive autocorrelation. The remainder of these notes gives instructions for dealing with lag-1 positive autocorrelation. If lag-1 negative autocorrelation is suspected, then the test statistic to be used is "4-D": in other words, if you are testing for significant negative autocorrelation, substitude "4-D" for "D" in the remainder of these notes. Note that this procedure works for lag-1 autocorrelation only, not for higher lags.

2.                  To determine if the problem is significant, look at the Durbin-Watson (D) statistic. The null hypothesis H0 is “no autocorrelation.” Look up the two critical values for D on the table in the appendix of Hamilton, D-upper (Du) and D-lower (Dl). Critical values depend on the number of cases in the time series (n) and the number of X variables in your regression (K-1). Your decision as to whether autocorrelation in the residuals is significant is made as follows:

            If D < Dl, reject H0, assume that autocorrelation is a problem

            If Dl < D < Du the Durbin-Watson test is inconclusive

            If D > Du, accept H0, assume that autocorrelation is not a problem

 If the test in inconclusive, the best thing to do would be to get more data points in your time series. However, that is usually not possible, and a reasonable procedure is to follow the instructions below as if autocorrelation is present. If the results show very little change from the original, then autocorrelation is probably not a problem. If the results are different than the original, then autocorrelation was probably affecting your results, and you should stick with the corrected results.

WHAT TO DO IF AUTOCORRELATION IS PRESENT

1.                  The best solution is to find an additional predictor (X) variable that has the autocorrelation, and that is related to your Y variable. In other words, your model is actually incomplete because you left out an important explanatory variable. If you include such a variable, the residuals would no longer have autocorrelation. However, often you do not have an additional variable that can do this.

2.                  If you want your model to predict the Y values significantly (according to the F statistic), or interpolate your Y values at points in between the data points, OLS is OK even with temporal autocorrelation in the residuals. As usual, you have to make sure that you know how big the errors are, and during which time periods they are positive and negative.

3.                  However if you want to use the model to understand the relationship between the predictors and predictand; that is, to use the model for statistical “inference” to estimate which coefficients are actually significant, then you can not trust the results when the residuals are autocorrelated, or in other words, when the residuals are not independent. Fortunately, for time series with these characteristics, there is something that can be done.

The first possibility is to select only points that are far enough apart (in time) so that they are uncorrelated; or average values over a time period that is longer than the autocorrelation time scale. For example, using meteorological variables: suppose you are regressing some variable against temperature. However, you find that temperature, and the residuals of your regression, are autocorrelated over time scales up to one day. You can use values from every other day, or 2-day averages, of all variables involved and then check to see if autocorrelation remains. Often, this is not an option, since there is perhaps insufficient data.

The second possibility is to remove autocorrelation from the analysis. This option is probably the most likely choice, and is discussed in more detail below.

The third possibility is to use an alternative to ordinary least squares, such as the method of maximum likelihood. OLS is equivalent to the method of maximum likelihood when all the assumptions are met. However, when the assumptions are not met, such as when the residuals are autocorrelated, OLS can give unrealistic confidence limits. Maximum likelihood methods are beyond the scope of this course.

HOW TO REMOVE AUTOCORRELATION FROM THE RESIDUALS, AND REDO THE REGRESSION

One can perform a relatively simple transformation of the variables, but it is a bit tricky. Here, we follow the procedure outlined by Neter et al, who use the Cochrane-Orcutt procedure.

1.                  First, you must estimate the first order regression coefficient, lets call it r, between the residuals and the lag-1 residuals. To do this, we perform a regression through the origin (with no constant in the regression model). In SPSS, this takes two steps. First, calculate the first order lag of the residuals using “transform / computer” and the LAG function. Then, perform a regression of the residuals (Y) vs the lagged residuals (X), but make sure to go to the options box and “uncheck” the box that says “include constant in equation.” Run the regression, and r is the unstandardized coefficient (the standardized coefficient is the lag-1 correlation of the residuals).

An alternative way to calculate r: r =
S(e t-1 * e t)/S(e t-1
   2), Where
S is the sum from t=2 through n.

2.                  The second major step involves a transformation of Y and X variables. Essentially, we are using r (calculated in the previous step) to remove the first order correlation in the Y and X variables. Let the variables denoted with the number 1 be the transformed variables:

           Y1 t = Y t – r Y t-1
            X1 t = X t – r X t-1

for Y and all X variables. To do this use use the “transform / compute” in SPSS. For example, if r=.75, then go to “transform / compute”, make a new variable x1, and type in the equation “x1 = x - .75 * lag(x)”; for y the equation would be “y1 = y = .75 * lag(y).” Do the same for all the X variables, and the Y variable.

3.                  Then, you re-do the regression using the transformed variables. (Don’t forget, you are now doing the standard method of regression using a constant in the equation, so you must make sure that box is checked in the SPSS regression options.) You will get new coefficients. Lets call the intercept of the transformed regression b01, and the first x coefficient b11. To convert back to the original variables, use these simple inverse transformations:

            original intercept: b0 = b01 / (1-r)
            original first x: b1 = b11
            original standard error of intercept: se[b0] = se[b01] / (1-r)
            original standard error of first x: se[b1] = se[b11]

In other words, a simple division by (1-r) gives you your intercept, and standard error of intercept, in the original variable, and your x coefficients do not have to be transformed at all. This means that you will have to calculate the constant (or intercept), and the standard error of the constant yourself, and determine if it is significantly different than zero. However, even if it is not significantly different than zero, the value you get is still the best estimate.

4.                  Last, but not least, you must see if you successfully removed the first order correlation in the transformed regression by using EDA on the transformed residuals and looking at the Durbin-Watson statistic. (Don’t forget that n is one less than originally, because when you lag variables you lose one.) If there is still significant autocorrelation, you might have to iterate this procedure more than once!!!!! If there is no significant autocorrelation, then you are done. If you have to iterate more than a couple of times, this procedure is probably not working.

5.                  So, in the end, your data file should probably have a bunch of new variables in addition to x and y. These might include: the original predicted and residual values; the lag-1 residuals; and x1 and y1.

[此贴子已经被作者于2008-2-13 10:07:11编辑过]

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2009-4-29 11:19:00

看不懂

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2009-5-2 09:39:00
自相关用回归模型的DW值来判断,异方差用levane test检验方差齐次性,多重共线性用VIF检验。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群