摘要翻译:
我们提供了直接的理论结果,证明将机器学习纳入标准线性工具变量设置是合理的。其核心思想是利用机器学习与样本分解相结合的方法,预测来自仪器的治疗变量和任何外生协变量,然后利用预测的治疗变量和协变量作为技术手段,在第二阶段恢复系数。这使得研究人员能够提取治疗和仪器之间的非线性协变,这可能通过提高仪器强度来显著提高估计精度和鲁棒性。重要的是,我们约束
机器学习的预测在外生协变量中是线性的,从而避免了由于处理和协变量之间的非线性关系而产生的虚假识别。我们证明了这种方法在弱条件下给出了一致的渐近正态估计,并证明了它是半参数有效的(Chamberlain,1992)。我们的方法保留了线性工具变量方法的标准直觉和解释,包括在弱识别下,并提供了应用经济学工具箱的简单、用户友好的升级。我们用法律和刑事司法中的一个例子来说明我们的方法,检查上诉法院推翻对地区法院判决的因果影响。
---
英文标题:
《Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear
IV Models》
---
作者:
Jiafeng Chen and Daniel L. Chen and Greg Lewis
---
最新提交年份:
2021
---
分类信息:
一级分类:Economics 经济学
二级分类:Econometrics 计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--
一级分类:Computer Science 计算机科学
二级分类:Machine Learning 机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Statistics 统计学
二级分类:Methodology 方法论
分类描述:Design, Surveys, Model Selection, Multiple Testing, Multivariate Methods, Signal and Image Processing, Time Series, Smoothing, Spatial Statistics, Survival Analysis, Nonparametric and Semiparametric Methods
设计,调查,模型选择,多重检验,多元方法,信号和图像处理,时间序列,平滑,空间统计,生存分析,非参数和半参数方法
--
---
英文摘要:
We offer straightforward theoretical results that justify incorporating machine learning in the standard linear instrumental variable setting. The key idea is to use machine learning, combined with sample-splitting, to predict the treatment variable from the instrument and any exogenous covariates, and then use this predicted treatment and the covariates as technical instruments to recover the coefficients in the second-stage. This allows the researcher to extract non-linear co-variation between the treatment and instrument that may dramatically improve estimation precision and robustness by boosting instrument strength. Importantly, we constrain the machine-learned predictions to be linear in the exogenous covariates, thus avoiding spurious identification arising from non-linear relationships between the treatment and the covariates. We show that this approach delivers consistent and asymptotically normal estimates under weak conditions and that it may be adapted to be semiparametrically efficient (Chamberlain, 1992). Our method preserves standard intuitions and interpretations of linear instrumental variable methods, including under weak identification, and provides a simple, user-friendly upgrade to the applied economics toolbox. We illustrate our method with an example in law and criminal justice, examining the causal effect of appellate court reversals on district court sentencing decisions.
---
PDF下载:
-->