（好文分享）屡见不鲜的一类Wrong工具变量——组均值

4747

收藏 2021-12-10

（好文分享）屡见不鲜的一类Wrong工具变量——组均值原创江河JH [url=]功夫计量经济学（公众号）[/url]
在一些国内外期刊上，我们都能经常看到一些作者在处理内生性问题时，使用组均值（不包含个体）作为变量的工具变量。
其中，表示组，表示第组样本数量。这样做的理由是：（1）组内个体的特征会受到组内其他个体的平均值或加总特征影响，即满足工具变量的相关性条件。（2）组内其他个体的平均或加总特征不直接影响个体的结果，即满足工具变量的外生性条件。两个例子：

本文选取相同行业同年度内其他公司的社会责任报告净正面语调的均值TONE_meant 作为工具变量进行2SLS回归，解决内生性处理。回归结果见表5。这个工具变量满足相关性和外生性的要求。从相关性来看，同行业的公司面临相似的外部环境和行业特征，因此，他们的社会责任报告语调具有一定的相关性，并且从表5第（1）列显示，社会责任报告语调的行业均值TONE_meant 的符号在1%的水平上显著为正，故满足相关性原则。且没有证据表明其他行业公司的社会责任报告语调会影响本公司的股价崩盘风险，所以满足外生性的原则。
——摘自《审计与经济研究》某篇论文

处于同一个城市和行业的企业可能在当地的经理人市场上争夺企业家人才, 企业是否为管理层提供薪酬激励计划需参考当地的竞争对手提供的激励薪酬, 而竞争对手提供的激励薪酬不应对本企业的创新产生直接的影响。我们参照Fisman&Svensson(2007) 的方法用CEO 持股(CEO Share) 、激励薪酬(Incentive) 、利润激励(Profit Inc) 和销售量激励(Sales Inc) 的区域-行业平均值作为对应变量的工具变量。——摘自《经济研究》某篇论文

这类工具变量屡见不鲜，甚至广为流传，以致初学者纷纷效法，他们的解释听起来似乎很有道理，然并卵，也就只能糊弄糊弄外行。关于这类工具变量的是非曲直，早有论文进行了定论，详见 Gormley and Matsa（2014）发表在RFS上的论文《Common Errors: How to (and Not to) Control for Unobserved Heterogeneity》，我在这里摘取了原文中的几段内容，以供大家学习。

Using independent variable group averages as instrumental variables. Independent variables’ group averages are also sometimes used as instrumental variables. Specifically, the researcher instruments for a potentially endogenous regressor using the regressor’s group average, , calculated excluding the observation at hand. The typical justification for such instruments is that the group average of X is correlated with but is not otherwise related to the dependent variable, . For example, a researcher estimating the impact of ROA on leverage but concerned that financial constraints introduce a simultaneity bias might propose using industry ROA to instrument for firm ROA. Using group averages of the independent variables as instrumental variables, however, leads to inconsistent estimates in the presence of unobserved grouplevel heterogeneity, as in Equation (1). The instrument violates the exclusion restriction whenever the unobserved heterogeneity, , is correlated with the independent variable, , because is then necessarily also correlated with . As noted earlier, such correlations are pervasive in practice. In this example, unobserved industry investment opportunities likely affect both ROA and leverage, making the proposed IV estimator inconsistent. Unlike the other applications discussed in this section, the problem with the IV estimation cannot be solved by adding fixed effects to the estimating equation. Although fixed effects control for the unobserved heterogeneity , , in the second stage estimation, the fixed effects reintroduce the endogeneity problem in the first stage estimation. Recall that the instrument, , is just the group mean excluding the observation at hand. After controlling for industry fixed effects, the instrument becomes which is perfectly correlated with the endogenous regressor, . Put differently, the instrument exploits strictly industry-level variation, which is not well-identified in the presence of industry fixed effects. For a group average instrument to be valid, the independent variable, , must be correlated with its group mean and the underlying economic source of this correlation must be unrelated to (the part of the industry variation that affects ). Although it is possible that there exist scenarios where these conditions hold, examples are rare. Researchers should not assume these conditions hold absent a strong economic justification.
——以上摘取自Gormley and Matsa（2014）论文

可能有一些朋友看到英文就头疼，我就给大家大致翻译一下这几段内容。为什么不建议大家使用组均值这类工具变量呢？因为作为使用组均值作为工具变量，通常都不满足外生性的要求，这会导致IV估计是非一致的。例如，我们想要研究企业ROA对杠杆率的影响，那么就不可避免地需要解决遗漏变量和双向因果所导致的内生性问题，如果我们使用行业ROA均值（不包含企业）作为企业ROA（变量）的工具变量，那么就会存在如下致命的问题：通常一个行业内的企业的ROA都会受到行业固定效应的影响，这就会导致解释变量企业ROA与行业固定效应相关，行业ROA均值（不包含企业）也必然会与行业固定效应相关，在没有控制行业固定效应的情况下，行业ROA均值就会与扰动项相关，这时行业ROA均值并不满足工具变量的外生性要求。如果在模型中加入行业固定效应，也不能解决这一问题。因为行业ROA均值（不包含企业）与行业固定效应几乎是共线的，他们的区别只在于，计算时不包含（企业的ROA）。在控制了行业固定效应后，能提供的新信息只有，也就是（企业的ROA），与内生变量完全重合。在这种情况下，就完全没有存在的意义，可谓是“毫无卵用”。

参考资料
[1]Todd A. Gormley, David A. Matsa. Common Errors: How to (and Not to) Control for Unobserved Heterogeneity[J]. The Review of Financial Studies, 2014, 27(2):617–661.
[2]邱嘉平. 因果推断实用计量方法[M].上海：上海财经大学出版社, 2020

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

赵安豆

2024-10-25 16:15:32

使用组均值作为工具变量的方法在实证经济学研究中确实非常常见，尤其是在处理内生性问题时。这种方法背后的逻辑是利用同一组（如行业、地区等）其他个体的特征来预测目标个体的变量，假设这种“横向”影响可以提供一个相对外生的变化来源，同时与因变量存在相关性。

然而，将组均值直接作为工具变量并非总是合适或有效的。以下几点原因可能会导致这种方法成为"Wrong工具变量":

1. **内生性和共线性问题**: 如果同一组内的个体相互之间有很强的关联（例如，它们的行为可能互相影响），那么这种组均值可能与模型中的其他解释变量高度相关，从而引入共线性问题。更严重的是，如果这种相关是因果关系导致的，那么组均值就不满足工具变量的外生性条件。

2. **遗漏变量偏误**: 如果组内个体受到一些未观察到的共同因素的影响（例如，行业政策、经济周期等），而这些因素同时也影响因变量，那么使用组均值作为工具变量可能会引入新的遗漏变量问题。

3. **异质性处理效应**: 即使组均值满足相关性和外生性的条件，在存在个体异质性的情况下，这种平均的处理可能无法准确反映特定个体的真实因果效应。例如，对于某个公司而言，行业内的其他公司社会责任报告语调对它影响的程度可能与该公司的具体特征有关。

4. **过度依赖于组内变异**: 如果目标变量在组间差异较大而在组内变化较小（即大部分变异来自组别之间而非内部），那么使用组均值作为工具变量的效应可能会被夸大或无法识别出真实的因果关系。

综上所述，虽然从理论上看利用组均值作为工具变量有其合理性和直观性，但在具体应用时需要仔细考虑上述潜在问题，并可能需要通过其他方法（如双差法、倾向得分匹配等）来进一步检验结果的稳健性。在设计实证策略时，理解数据结构和因果机制是至关重要的。

此文本由CAIE学术大模型生成，添加下方二维码，优先体验功能试用

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群