zm6040 发表于 2014-4-26 11:29 
自己来终结自己的问题,最近偶然找到了相关文献解释Chen S., Sun Z., Tang S., Wu D., 2011, "Government I ...
Chen S., Sun Z., Tang S., Wu D., 2011, "Government Intervention and Investment Efficiency: Evidence From China", Journal of Corporate Finance(17), pp.259~271.的附注11
“As suggested by Belsley et al. (1980), observations with Cook's D larger than 4/(n − k − 1) (where n and k is the sample size and number of regressors,
respectively) or the absolute value of studentized residuals larger than 2 can cause undue influences on the regression results. In fitting Model (1) on the full sample or various sub-samples, about 4–5% of the observations are identified as such influential observations. We therefore winsorize the continuous variables at the 2.5 top and bottom percentiles of their respective distributions. Alternatively, we drop influential observations identified by the above criteria, and the conclusions remain the same.”
Belsley, D., Kuh, E., Welsch, R., 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley & Sons, New Jersey.
《政_府干预与投资效率:来自中国的证据》一文的附注11:
正如Belsley等(1980)所建议的,Cook's D指标 大于 4/(n − k − 1) 的观测案例(其中,n是样本容量,k是回归因子的个数),或者 学生化残差( studentized residuals )大于 2 的观测案例,能够给回归结果造成 过度的/不恰当的(undue)影响。在基于全部样本 或 各种不同的子样本 来拟合 模型(1)时,大约有 4%~5%的观测案例被确认为具有(这种不当的)影响力的观测案例。因此,我们对于连续型变量在各自分布的 上、下 2.5% (即 第2.5百分位数以下的obs 和 第 97.5百分位数以上的obs)实施缩尾处理。作为另一种替代性方案,我们(也可以)依据上述标准删除有(不当的)影响力的观测案例,而(所得到的)的结论是相同的。
建议来自:Belsley、Kuh、Welsch(1980):《模型回归诊断方法:识别有影响力的数据以及共线性的来源》