请问：在什么情况下回归分析时需要引入交互项？

xjtuluo

46592

收藏 2008-07-16

谢谢各位前辈指教！万分感谢~~~~

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

wildauto

2008-7-16 13:46:00

你问的是交叉弹性吧？

一般情况下在不同具有相互替代的商品进行市场分析时候需要进行交叉弹性的分析计算。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

xjtuluo

2008-7-16 13:53:00

谢谢2楼楼主！不过跟你说的不完全相符。我个人感觉：如果两个自变量本身相关，而又要把这两个自变量同时放入回归模型，就得引入它们的交互项。不知道是不是这样？请高手指教！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

sheepmiemie

2008-7-18 02:13:00

LS过于想当然了；

实际上交叉项的引入更多看的是实际意义和引入后模型的拟合效果。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

hanszhu

2008-7-18 08:32:00

Interactions In Multiple Regression Models

Continuous Predictors

[This example involves a cross-sectional study of HDL cholesterol (HCHOL, the so-called good cholesterol) and body mass index (BMI), a measure of obesity. Since both BMI and HDL cholesterol will be related to total cholesterol (CHOL), it would make good sense to adjust for total cholesterol.]

In the multiple regression models we have been considering so far, the effects of the predictors have been additive. When HDL cholesterol is regressed on total cholesterol and BMI, the fitted model is

Dependent Variable: HCHOL Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEPT 1 64.853 8.377 7.742 0.000 BMI 1 -1.441 0.321 -4.488 0.000 CHOL 1 0.068 0.027 2.498 0.014

says that the expected difference in HCHOL is 0.068 per unit difference in CHOL when BMI is held fixed. This is true whatever the value of BMI. The difference in HCHOL is -1.441 per unit difference in BMI when CHOL is held fixed. This is true whatever the value of CHOL. The effects of CHOL and BMI are additive because the expected difference in HDL cholesterol corresponding to differences in both CHOL and BMI is obtained by adding the differences expected from CHOL and BMI determined without regard to the other's value.

The model that was fitted to the data (HCHOL = b₀ + b₁CHOL + b₂ BMI ) forces the effects to be additive, that is, the effect of CHOL is the same for all values of BMI and vice-versa because the model won't let it be anything else. While this condition might seem restrictive, experience shows that it is a satisfactory description of many data sets. (I'd guess it depends on your area of application.)

Even if additivity is appropriate for many situations, there are times when it does not apply. Sometimes, the purpose of a study is to formally test whether additivity holds. Perhaps the way HDL cholesterol varies with BMI depends on total cholesterol. One way to investigate this is by including an interaction term in the model. Let BMICHOL=BMI*CHOL, the product of BMI and CHOL. The model incorporating the interaction is

Dependent Variable: HCHOL Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEPT 1 -24.990 38.234 -0.654 0.515 BMI 1 2.459 1.651 1.489 0.139 CHOL 1 0.498 0.181 2.753 0.007 BMI*CHOL 1 -0.019 0.008 -2.406 0.018

The general form of the model is Y = b₀ + b₁ X + b₂ Z + b₃ XZ

It can be rewritten two ways to show how the change in response with one variable depends on the other.

(1) Y = b₀ + b₁ X + (b₂ + b₃ X) Z
(2) Y = b₀ + b₂ Z + (b₁ + b₃ Z) X

Expression (1) shows the difference in Y per unit difference in Z when X is held fixed is (b₂+b₃X). This varies with the value of X. Expression (2) shows the difference in Y per unit difference in X when Z is held fixed is (b₁+b₃Z). This varies with the value of Z. The coefficient b₃ measures the amount by which the change in response with one predictor is affected by the other predictor. If b₃ is not statistically significant, then the data have not demonstrated the change in response with one predictor depends on the value of the other predictor. In the HCHOL, COL, BMI example, the model

HCHOL = -24.990 + 0.498 CHOL + 2.459 BMI - 0.019 CHOL * BMIcan be rewritten

HCHOL = -24.990 + 0.498 CHOL + (2.459 - 0.019 CHOL) BMI or
HCHOL = -24.990 + 2.459 BMI + (0.498 - 0.019 BMI) CHOL

Comment: Great care must be exercised when interpreting the coefficients of individual variables in the presence of interactions. The coefficient of BMI is 2.459. In the absence of an interaction, this would be interpreted as saying that among those with a given total cholesterol level, those with greater BMIs are expected to have greater HDL levels! However, once the interaction is taken into account, the coefficient for BMI is, in fact, (2.459-0.019 CHOL), which is negative provided total cholesterol is greater than 129, which is true of all but 3 subjects.

Comment: The inclusion of interactions when the study was not specifically designed to assess them can make it difficult to estimate the other effects in the model. If

a study was not specifically designed to assess interactions,
there is no a priori reason to expect an interaction,
interactions are being assessed "for insurance" because modern statistical software makes it easy, and
no interaction is found,

it is best to refit the model without the interaction so other effects might be better assessed.

a study was not specifically designed to assess interactions,
there is no a priori reason to expect an interaction,
interactions are being assessed "for insurance" because modern statistical software makes it easy, and
no interaction is found,

it is best to refit the model without the interaction so other effects might be better assessed.

Indicator Predictor Variables

Interactions have a special interpretation when one of the predictors is a categorical variable with two categories. Consider an example in which the response Y is predicted from a continuous predictor X and indicator of sex (M0F1, =0 for males and 1 for females). The model

Y = b₀ + b₁ X + b₂ M0F1

specifies two simple linear regression equations. For men, M0F1=0 and

Y = b₀ + b₁ Xwhile, for women, M0F1=1 and Y = (b₀ + b₂) + b₁ X

The change in Y per unit change in X--b₁--is the same for men and women. The model forces the regression lines to be parallel. The difference between men and women is the same for all values of X and is equal to b₂, the difference in Y-intercepts.

Including a sex-by-X interaction term in the model allows the regression lines for men and women to have different slopes.

Y = b₀ + b₁ X + b₂ M0F1 + b₃ X * M0F1

For men, the model reduces to Y = b₀ + b₁ X
while for women, it is Y = (b₀ + b₂) + (b₁ + b₃) X

Thus, b₃ is the difference in slopes. The slopes for men and women will have been shown to differ if and only if b₃ is statistically significant.

The individual regression equations for men and women obtained from the multiple regression equation with a sex-by-X interaction are identical to the equations that are obtained by fitting a simple linear regression of Y on X for men and women separately. The advantage of the multiple regression approach is that it simplifies the task of testing whether the regression coefficients for X differ between men and women.

Comment: A common mistake is to compare two groups by fitting separate regression models and declaring them different if the regression coefficient is statistically significant in one group and not the other. it may be the two regression coefficients are similar with P values close to and on either side of 0.05. In order to show men and women response differently to a change in the continuous predictor, the multiple regression approach must be used and the difference in regression coefficients as measured by the sex-by-X interaction must be tested formally.

Centering

Centering refers to the practice of subtracting a constant from predictors before fitting a regression model. Often the constant is a mean, but it can be any value.

There are two reasons to center. One is technical. The numerical routines that fit the model are often more accurate when variables are centered. Some computer programs automatically center variables and transform the model back to the original variables, all without the user's knowledge.

The second reason is practical. The coefficients from a centered model are often easier to interpret. Consider the model that predicts HDL cholesterol from BMI and total cholesterol and a centered version fitted by subtracting 22.5 from each BMI and 215 from each total cholesterol.

Original: HCHOL = -24.990 + 0.498 CHOL + 2.459 BMI - 0.019 CHOL * BMI
Centered: HCHOL = 47.555 + 0.080 (CHOL-215) - 1.537 (BMI-22.5) - 0.019 (CHOL-215) (BMI-22.5)

In the original model

-24.990 is the expected HDL cholesterol level for someone with total cholesterol and BMI of 0,
0.498 is the difference in HDL cholesterol corresponding to a unit difference in total cholesterol for someone with a BMI of 0, and
2.459 is the difference in HDL cholesterol corresponding to a unit difference in BMI for someone with a total cholesterol of 0.

Not exactly the most useful values. In the centered model, however,

47.555 is the expected HDL cholesterol level for someone with total a cholesterol of 215 and a BMI of 22.5,
0.080 is the difference in HDL cholesterol corresponding to a unit difference in total cholesterol for someone with a BMI of 22.5, and
-1.537 is the difference in HDL cholesterol corresponding to a unit difference in BMI for someone with a total cholesterol of 215.

When there is an interaction in the model,

the coefficients for the individual uncentered variables are the differences in response corresponding to a unit change in the predictor when the other predictors are 0, while
the coefficients for the individual centered variables are the differences in response corresponding to a unit change in the predictor when the other predictors are at their centered values.

＜script language="JavaScript"＞＜/script＞Last modified: 07/15/2008 19:55:09.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

hanszhu

2008-7-18 08:49:00

Crossproduct interaction terms may be highly correlated with the corresponding simple independent variables in the regression equation, creating problems with assessing the relative importance of main effects and interaction effects. Note: Because of possible multicollinearity, it may well be desirable to use centered variables (where one has subtracted the mean from each datum) -- a transformation which often reduces multicollinearity.

To create an interaction term between a categorical variable and a continuous variable, first the categorical variable is dummy-coded, creating (k - 1) new variables, one for each level of the categorical variable except the omitted reference category. The continuous variable is mutliplied by each of the (k - 1) dummy variables. The terms entered into the regression include the continuous variable, the (k - 1) dummy variables, and the (k - 1) cross-product interaction terms. Also, a regression is run without the interaction terms. The R-squared difference measures the effect of the interaction. The beta weights for the interaction terms in the regression which includes the interaction terms measure the relative predictive power of the effects of the continuous variable given specific levels of the categorical variable. It is also possible to include both interaction terms and power terms in the model by multiplying the dummies by the square of the continuous variable, but this can lead to an excessive number of terms in the model. One approach to dealing with this is to use stepwise regression's F test as a criterion to stop adding terms to the model.

When an ordinal variable has been entered as a set of dummy variables, the interaction of another variable with the ordinal variable will involve multiple interaction terms. In this case the F-test of the significance of the interaction of the two variables is the significance of the change of R-square of the equation with the interaction terms and the equation without the set of terms associated with the ordinal variable.

[此贴子已经被作者于2008-7-18 8:50:25编辑过]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

xjtuluo

2008-7-18 09:36:00

太感谢了！楼主们提供的资料很详细

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

大象跳舞

2008-7-18 10:15:00

终于弄懂了。谢谢xjtuluo的好问题！谢谢hanszhu的详细解答！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

apple4513

2008-12-2 11:42:00

哪位给用汉语系统地解释一下……没怎么看懂呢

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

kk22boy

2009-11-26 10:06:12

谢谢楼上。。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

wfpandy

2009-11-26 13:43:04

hanszhu 果然学科带头人啊

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

talentyxc

2009-11-26 23:23:33

自变量只要不是完全线性就不会影响它的无偏性或一致性，跟引不引入交互项无关

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

rui2003fang

2009-12-11 23:09:14

这篇文章我看得不是很懂，是不是说明采用交叉项并不能解决两个解释变量相互影响的问题？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

chyshl

2009-12-12 18:11:09

貌似如LZ所说两个自变量相关的话，应该就是所谓的共线性，应该剔除其中一个贡献较小的变量吧？还加交互项干吗？
交互项这东西，数学上讲的头头是道，可应用于实际问题，那解释可不是一般的繁琐，俺认为能不用交互项还是尽量别用，主效应能说明的问题，加了交互项会使问题说不清楚。模型拟合，还是越简单越好啊。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

wobushita

2011-12-12 15:22:45

英文解释啊！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

jiangbogz

2012-3-16 09:04:21

比较详细的关于交互项的资料

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

tcpq

2012-4-9 20:10:33

hanszhu 发表于 2008-7-18 08:32
 Interactions In Multiple Regression ModelsContinuous Predictors[This example involves a cross- ...

请教一下你，两个变量都不是虚拟变量，能不能也y=a+b*X+c*Z+d*X*Z呢？和是虚拟变量的做法一样？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

yiluo2008

2012-5-18 16:39:30

不懂，同问

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

刘小娟0-0

2012-5-27 10:46:41

hanszhu 发表于 2008-7-18 08:49
Crossproduct interaction terms may be highly correlated with the corresponding simple independent va ...

刚看过，有点吃力，但受益匪浅。有几个不懂的地方请指教，在某些经济学模型中需要加入交互项，会产生交互项和原来单个的解释变量之前的过线性，我现在的模型中共线性就比较强，但是我又希望能同时保留交互项和单个的解释变量，您提到说用中心变量处理（这个翻译可能不科学）可以有效解决存在的共线性问题，那么这个过程怎么在stata中实现呢？还请各位高手多多指教，感激不尽！！！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

sophie_tears

2012-7-17 06:33:43

同问哪

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

leonyue1989

2012-7-20 10:22:34

楼下几位分析的很到位，mark下，谢谢了

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

wlqust

2012-10-12 16:37:03

hanszhu 发表于 2008-7-18 08:32
 Interactions In Multiple Regression ModelsContinuous Predictors[This example involves a cross- ...

Thanks a lot!

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

sunkai_bick

2012-10-30 00:17:09

学习！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

h3327156

2012-10-30 00:57:27

hanszhu 贴的那篇英文蛮有趣的，喜欢health economics的我，感觉里面的字经常看到。
像身体质量指数、高密度胆固醇【好的胆固醇】、总胆固醇【这些字在身体健康检查也蛮常见的】

感谢这一篇的报导阿！受教了。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

leejean102

2012-12-23 11:06:55

hanszhu 发表于 2008-7-18 08:49
Crossproduct interaction terms may be highly correlated with the corresponding simple independent va ...

有用

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

peyzf

2013-1-1 07:35:11

when the moderator effect exists.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

zhoujsh1980

2013-7-17 01:08:27

mark一下

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

zhang546468362

2013-9-25 09:21:15

hanszhu 发表于 2008-7-18 08:32
 Interactions In Multiple Regression ModelsContinuous Predictors[This example involves a cross- ...

如果是两个虚拟变量做交互，应该就不用中心化了吧？这篇文章讲了一个虚拟变量和连续变量的解释，有没有两个都是虚拟变量的呢？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

whqisong

2014-10-9 10:08:17

chyshl 发表于 2009-12-12 18:11
貌似如LZ所说两个自变量相关的话，应该就是所谓的共线性，应该剔除其中一个贡献较小的变量吧？还加交互项干 ...

表示不懂，什么是交互？就是相乘吗？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

Carson~~

2015-10-19 15:15:18

hanszhu 发表于 2008-7-18 08:49
Crossproduct interaction terms may be highly correlated with the corresponding simple independent va ...

受教了

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

Interactions In Multiple Regression Models

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群