英文标题:
《Model selection consistency from the perspective of generalization
ability and VC theory with an application to Lasso》
---
作者:
Ning Xu, Jian Hong, Timothy C.G. Fisher
---
最新提交年份:
2016
---
英文摘要:
Model selection is difficult to analyse yet theoretically and empirically important, especially for high-dimensional data analysis. Recently the least absolute shrinkage and selection operator (Lasso) has been applied in the statistical and econometric literature. Consis- tency of Lasso has been established under various conditions, some of which are difficult to verify in practice. In this paper, we study model selection from the perspective of generalization ability, under the framework of structural risk minimization (SRM) and Vapnik-Chervonenkis (VC) theory. The approach emphasizes the balance between the in-sample and out-of-sample fit, which can be achieved by using cross-validation to select a penalty on model complexity. We show that an exact relationship exists between the generalization ability of a model and model selection consistency. By implementing SRM and the VC inequality, we show that Lasso is L2-consistent for model selection under assumptions similar to those imposed on OLS. Furthermore, we derive a probabilistic bound for the distance between the penalized extremum estimator and the extremum estimator without penalty, which is dominated by overfitting. We also propose a new measurement of overfitting, GR2, based on generalization ability, that converges to zero if model selection is consistent. Using simulations, we demonstrate that the proposed CV-Lasso algorithm performs well in terms of model selection and overfitting control.
---
中文摘要:
模型选择很难分析,但在理论和经验上都很重要,尤其是对于高维
数据分析。最近,最小绝对收缩和选择算子(Lasso)已应用于统计和计量经济学文献中。套索的一致性已在各种条件下建立,其中一些条件在实践中难以验证。本文在结构风险最小化(SRM)和Vapnik-Chervonenkis(VC)理论的框架下,从泛化能力的角度研究了模型选择问题。该方法强调样本内拟合和样本外拟合之间的平衡,这可以通过使用交叉验证来选择对模型复杂性的惩罚来实现。我们证明了模型的泛化能力与模型选择一致性之间存在着精确的关系。通过实现SRM和VC不等式,我们证明了在类似于OLS的假设下,Lasso对于模型选择是L2一致的。此外,我们还推导了惩罚极值估计量与无惩罚极值估计量之间的距离的概率界,该界主要由过拟合决定。我们还提出了一种基于泛化能力的新的过拟合度量GR2,如果模型选择一致,该度量将收敛到零。通过仿真,我们证明了所提出的CV-Lasso算法在模型选择和过拟合控制方面表现良好。
---
分类信息:
一级分类:Statistics 统计学
二级分类:Machine Learning
机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Quantitative Finance 数量金融学
二级分类:Economics 经济学
分类描述:q-fin.EC is an alias for econ.GN. Economics, including micro and macro economics, international economics, theory of the firm, labor economics, and other economic topics outside finance
q-fin.ec是econ.gn的别名。经济学,包括微观和宏观经济学、国际经济学、企业理论、劳动经济学和其他金融以外的经济专题
--
一级分类:Statistics 统计学
二级分类:Computation 计算
分类描述:Algorithms, Simulation, Visualization
算法、模拟、可视化
--
---
PDF下载:
-->