异质治疗效果的通用机器学习推理随机实验

512

收藏 2022-03-06

摘要翻译：
我们提出了在随机实验中估计和推断异质效应的关键特征的策略。这些关键特征包括对机器学习代理影响的最佳线性预测器、按影响组排序的平均影响以及受影响最大和最小单元的平均特征。该方法在高维环境中是有效的，在高维环境中，用机器学习方法来模拟效果。我们将这些代理后处理成对关键特征的估计。我们的方法是通用的，它可以与惩罚方法、深层和浅层神经网络、规范和新的随机森林、增强树和集成方法结合使用。估计和推断是基于重复的数据拆分，以避免过拟合和达到有效性。对于推论，我们取许多不同数据分裂产生的p值的中值和置信区间的中值，然后调整它们的名义水平以保证一致的有效性。这种变分推理方法可以量化参数估计和数据分裂两方面的不确定性，对一类数据产生过程是一致有效的。我们通过一个随机的现场实验说明了该方法的使用，该实验评估了刺激印度免疫需求的推动者的组合。
---
英文标题：
《Generic Machine Learning Inference on Heterogenous Treatment Effects in
  Randomized Experiments》
---
作者：
Victor Chernozhukov, Mert Demirer, Esther Duflo, and Iv\'an
  Fern\'andez-Val
---
最新提交年份：
2020
---
分类信息：

一级分类：Statistics 统计学
二级分类：Machine Learning 机器学习
分类描述：Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文（监督，无监督，半监督学习，图形模型，强化学习，强盗，高维推理等）与统计或理论基础
--
一级分类：Economics 经济学
二级分类：Econometrics 计量经济学
分类描述：Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论，微观计量经济学，宏观计量经济学，通过新方法发现的经济关系的实证内容，统计推论应用于经济数据的方法论方面。
--
一级分类：Mathematics 数学
二级分类：Statistics Theory 统计理论
分类描述：Applied, computational and theoretical statistics: e.g. statistical inference, regression, time series, multivariate analysis, data analysis, Markov chain Monte Carlo, design of experiments, case studies
应用统计、计算统计和理论统计：例如统计推断、回归、时间序列、多元分析、数据分析、马尔可夫链蒙特卡罗、实验设计、案例研究
--
一级分类：Statistics 统计学
二级分类：Statistics Theory 统计理论
分类描述：stat.TH is an alias for math.ST. Asymptotics, Bayesian Inference, Decision Theory, Estimation, Foundations, Inference, Testing.
Stat.Th是Math.St的别名。渐近，贝叶斯推论，决策理论，估计，基础，推论，检验。
--

---
英文摘要：
  We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects on machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied by machine learning methods. We post-process these proxies into the estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, deep and shallow neural networks, canonical and new random forests, boosted trees, and ensemble methods. Estimation and inference are based on repeated data splitting to avoid overfitting and achieve validity. For inference, we take medians of p-values and medians of confidence intervals, resulting from many different data splits, and then adjust their nominal level to guarantee uniform validity. This variational inference method, which quantifies the uncertainty coming from both parameter estimation and data splitting, is shown to be uniformly valid for a large class of data generating processes. We illustrate the use of the approach with a randomized field experiment that evaluated a combination of nudges to stimulate demand for immunization in India.
---
PDF链接：
https://arxiv.org/pdf/1712.04802

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群