摘要翻译:
人们对估计因果效应的实验评估越来越感兴趣,部分原因是它们的内部有效性往往很高。与此同时,作为大数据革命的一部分,大量、详细和具有代表性的行政数据集变得更加广泛。然而,仅仅基于这些数据集的因果效应估计的可信度可能很低。在这篇论文中,我们发展了系统地结合实验和观察数据的统计方法,以获得我们只在观察样本中观察到的二元处理对一个主要结果的因果效应的可信估计。观察和实验样本都包含关于治疗、可观察的个人特征和次要(通常是短期)结果的数据。为了估计治疗对主要结果的影响,同时解决观察样本中潜在的混杂,我们提出了一种方法,利用从实验样本中估计治疗和次要结果之间的关系。如果观察样本中的治疗分配是不混淆的,我们预计两个样本中的治疗对次要结果的影响是相似的。我们将两个样本之间对次要结果的估计因果效应的差异解释为观察样本中未观察到的混杂物的证据,并开发控制函数方法,利用这些差异来调整对主要结果的治疗效果的估计。我们通过结合STAR项目实验的班级规模和三年级考试成绩的数据与纽约学校系统的班级规模和三年级和八年级考试成绩的观察数据来说明这些观点。
---
英文标题:
《Combining Experimental and Observational Data to Estimate Treatment
  Effects on Long Term Outcomes》
---
作者:
Susan Athey and Raj Chetty and Guido Imbens
---
最新提交年份:
2020
---
分类信息:
一级分类:Statistics        统计学
二级分类:Methodology        方法论
分类描述:Design, Surveys, Model Selection, Multiple Testing, Multivariate Methods, Signal and Image Processing, Time Series, Smoothing, Spatial Statistics, Survival Analysis, Nonparametric and Semiparametric Methods
设计,调查,模型选择,多重检验,多元方法,信号和图像处理,时间序列,平滑,空间统计,生存分析,非参数和半参数方法
--
一级分类:Economics        经济学
二级分类:Econometrics        计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--
---
英文摘要:
  There has been an increase in interest in experimental evaluations to estimate causal effects, partly because their internal validity tends to be high. At the same time, as part of the big data revolution, large, detailed, and representative, administrative data sets have become more widely available. However, the credibility of estimates of causal effects based on such data sets alone can be low.   In this paper, we develop statistical methods for systematically combining experimental and observational data to obtain credible estimates of the causal effect of a binary treatment on a primary outcome that we only observe in the observational sample. Both the observational and experimental samples contain data about a treatment, observable individual characteristics, and a secondary (often short term) outcome. To estimate the effect of a treatment on the primary outcome while addressing the potential confounding in the observational sample, we propose a method that makes use of estimates of the relationship between the treatment and the secondary outcome from the experimental sample. If assignment to the treatment in the observational sample were unconfounded, we would expect the treatment effects on the secondary outcome in the two samples to be similar. We interpret differences in the estimated causal effects on the secondary outcome between the two samples as evidence of unobserved confounders in the observational sample, and develop control function methods for using those differences to adjust the estimates of the treatment effects on the primary outcome.   We illustrate these ideas by combining data on class size and third grade test scores from the Project STAR experiment with observational data on class size and both third and eighth grade test scores from the New York school system. 
---
PDF链接:
https://arxiv.org/pdf/2006.09676