Andrew Gelman & Adam Zelizer, 2014; Evidence on the deleterious impact of sustained use of polynomial regression on causal inference
"It is common in regression discontinuity (RD) analysis to control for third- or fifth-degree polynomials of the assignment variable. Such models can overfit, leading to causal inferences that are substantively implausible and that arbitrarily attribute variation to the high-degree polynomialor the discontinuity. This paper examines two recent studies that make use of regression discontinuity to discuss evident practical problems with these estimates and how they interact with pathologies of the current system of scientific publication. First, we discuss a recent study that estimates the effect on air pollution and life expectancy of a coal-heating policy in China (Chen, Y., Ebenstein, A., Greenstone, M., and Li, H.; 2013).The reported effects, based on a third-degree polynomial, are statistically significant but substantively dubious, and are sensitive to model choice. This study is indicative of a category of policy analyses where strong claims are based on weak data and methodologies which permit the researcher wide latitude in presenting estimated treatment effects. We then replicate a procedure from Green et al., in which regression discontinuity is used to recover estimated treatment effects relative to an experimental benchmark, to illustrate one practical problem with the RD estimates in the coal-heating paper: high-degree polynomials yield noisy estimates of treatment effects that do not accurately convey uncertainty. We recommend that (a) researchers consider the problems which may result from controlling for higher-order polynomials; and (b) that journals recognize that quantitative analyses of policy issues are often inconclusive and relax the implicit rule under which statistical significance is a condition for publication. "
据说这是一篇八卦,以及,讽刺:
...Speculations are presented as fact. For example, the China air pollution study was featured in a New York Times article (Wong, 2013) that referred unquestioningly to “the 5.5-year drop in life expectancy in the north,” as well as in a New Yorker article by a Pulitzer prizewinning reporter (Johnson, 2013) who simply wrote that a study “noted that pollution from coal reduces average life expectancy in northern China by five and a half years,” with no indication that the “five and a half years” number was just a point estimate, even setting aside questions about the validity of that estimate.