摘要翻译:
本文给出了主要机器学习模型与结构计量经济模型的样本外预测比较。在过去的十年里,机器学习已经在许多预测应用中成为一种强大的工具,但这种方法在实证经济研究中仍然没有被广泛采用。为了评估这种方法的好处,我使用最常见的机器学习算法,CART、C4.5、LASSO、random forest和adaboost为墨西哥Progresa项目进行的现金转移实验构建预测模型,并将预测结果与以前的结构计量经济学研究结果进行比较。本文进行了两个预测任务:样本外预测和长期样本内模拟。对于样本外预测,所有机器学习模型的平均绝对误差和均方根误差都小于结构模型的平均绝对误差和均方根误差。随机森林和adaboost对所有亚组的个体结果有最高的准确性。对于长期的样本内模拟,结构模型比所有的机器学习模型都有更好的性能。机器学习模型的样本内适应度差是由于收入和怀孕预测模型的不准确性造成的。结果表明,当需要学习的数据较多时,机器学习模型优于结构模型;然而,当数据有限时,结构模型提供了更合理的预测。本文的研究结果为在大数据时代将
机器学习应用于经济政策分析提供了前景。
---
英文标题:
《Evaluating Conditional Cash Transfer Policies with Machine Learning
Methods》
---
作者:
Tzai-Shuen Chen
---
最新提交年份:
2018
---
分类信息:
一级分类:Economics 经济学
二级分类:Econometrics 计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--
一级分类:Statistics 统计学
二级分类:Machine Learning 机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
---
英文摘要:
This paper presents an out-of-sample prediction comparison between major machine learning models and the structural econometric model. Over the past decade, machine learning has established itself as a powerful tool in many prediction applications, but this approach is still not widely adopted in empirical economic studies. To evaluate the benefits of this approach, I use the most common machine learning algorithms, CART, C4.5, LASSO, random forest, and adaboost, to construct prediction models for a cash transfer experiment conducted by the Progresa program in Mexico, and I compare the prediction results with those of a previous structural econometric study. Two prediction tasks are performed in this paper: the out-of-sample forecast and the long-term within-sample simulation. For the out-of-sample forecast, both the mean absolute error and the root mean square error of the school attendance rates found by all machine learning models are smaller than those found by the structural model. Random forest and adaboost have the highest accuracy for the individual outcomes of all subgroups. For the long-term within-sample simulation, the structural model has better performance than do all of the machine learning models. The poor within-sample fitness of the machine learning model results from the inaccuracy of the income and pregnancy prediction models. The result shows that the machine learning model performs better than does the structural model when there are many data to learn; however, when the data are limited, the structural model offers a more sensible prediction. The findings of this paper show promise for adopting machine learning in economic policy analyses in the era of big data.
---
PDF链接:
https://arxiv.org/pdf/1803.06401