摘要翻译:
近年来,越来越多的高维数据集为应用研究人员提供,其参数数$P$高于观测数$N$甚至更大。Boosting算法代表了近年来
机器学习和统计学的主要进展之一,适用于此类数据集的分析。虽然Lasso在经济学中的高维数据集方面已经得到了非常成功的应用,但boosting在这一领域的应用还很少,尽管它在生物统计学和模式识别等领域已经被证明是非常强大的。我们把这归因于缺少理论结果。本文的目标是填补这一空白,并表明boosting是一种在高维环境下推断治疗效果或工具变量(IV)估计的竞争性方法。首先,我们提出了基于成分最小二乘法的$L_2$Boosting算法,以及适合于大多数经济计量问题的回归问题的变体。然后我们展示了如何利用$L_2$Boosting来估计治疗效果和IV估计。我们重点介绍了这些方法,并用模拟和经验例子来说明它们。关于进一步的结果和技术细节,我们参考Luo和Spindler(2016,2017)和论文的在线补充。
---
英文标题:
《$L_2$Boosting for Economic Applications》
---
作者:
Ye Luo and Martin Spindler
---
最新提交年份:
2017
---
分类信息:
一级分类:Statistics 统计学
二级分类:Machine Learning 机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Economics 经济学
二级分类:Econometrics 计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--
一级分类:Statistics 统计学
二级分类:Methodology 方法论
分类描述:Design, Surveys, Model Selection, Multiple Testing, Multivariate Methods, Signal and Image Processing, Time Series, Smoothing, Spatial Statistics, Survival Analysis, Nonparametric and Semiparametric Methods
设计,调查,模型选择,多重检验,多元方法,信号和图像处理,时间序列,平滑,空间统计,生存分析,非参数和半参数方法
--
---
英文摘要:
In the recent years more and more high-dimensional data sets, where the number of parameters $p$ is high compared to the number of observations $n$ or even larger, are available for applied researchers. Boosting algorithms represent one of the major advances in machine learning and statistics in recent years and are suitable for the analysis of such data sets. While Lasso has been applied very successfully for high-dimensional data sets in Economics, boosting has been underutilized in this field, although it has been proven very powerful in fields like Biostatistics and Pattern Recognition. We attribute this to missing theoretical results for boosting. The goal of this paper is to fill this gap and show that boosting is a competitive method for inference of a treatment effect or instrumental variable (IV) estimation in a high-dimensional setting. First, we present the $L_2$Boosting with componentwise least squares algorithm and variants which are tailored for regression problems which are the workhorse for most Econometric problems. Then we show how $L_2$Boosting can be used for estimation of treatment effects and IV estimation. We highlight the methods and illustrate them with simulations and empirical examples. For further results and technical details we refer to Luo and Spindler (2016, 2017) and to the online supplement of the paper.
---
PDF链接:
https://arxiv.org/pdf/1702.03244