全部版块 我的主页
论坛 经济学人 二区 外文文献专区
635 0
2022-03-26
摘要翻译:
本文从非渐近的观点出发,研究了V-折叠交叉验证(VFCV)在模型选择中的有效性,并对其进行了改进,称之为“V-折叠惩罚”。考虑一个特殊的(虽然简单的)回归问题,我们证明了V有界的VFCV对于模型选择是次优的,因为V越大,它就越“过分惩罚”。因此,渐近最优性要求V达到无穷大。然而,当信噪比较低时,似乎需要过度惩罚,因此,尽管存在可变性问题,但最佳V并不总是较大的V。一些模拟数据证实了这一点。为了提高VFCV的预测性能,我们定义了一种新的模型选择过程,称为“V-折叠惩罚”(penVF)。它是Efron的bootstrap惩罚的V倍子抽样版本,因此它具有与VFCV相同的计算代价,同时更加灵活。在异方差回归框架下,假设模型具有特定的结构,证明了当样本容量为无穷大时,penVF满足一个导常数趋于1的非渐近oracle不等式。特别是,这意味着对回归函数平滑性的适应性,即使在高度异方差噪声下也是如此。此外,与V参数无关,使用penVF很容易过度惩罚。仿真研究表明,该方法在非渐近情况下对VFCV有明显的改善。
---
英文标题:
《V-fold cross-validation improved: V-fold penalization》
---
作者:
Sylvain Arlot (LM-Orsay, INRIA Futurs)
---
最新提交年份:
2008
---
分类信息:

一级分类:Mathematics        数学
二级分类:Statistics Theory        统计理论
分类描述:Applied, computational and theoretical statistics: e.g. statistical inference, regression, time series, multivariate analysis, data analysis, Markov chain Monte Carlo, design of experiments, case studies
应用统计、计算统计和理论统计:例如统计推断、回归、时间序列、多元分析、数据分析、马尔可夫链蒙特卡罗、实验设计、案例研究
--
一级分类:Statistics        统计学
二级分类:Machine Learning        机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Statistics        统计学
二级分类:Statistics Theory        统计理论
分类描述:stat.TH is an alias for math.ST. Asymptotics, Bayesian Inference, Decision Theory, Estimation, Foundations, Inference, Testing.
Stat.Th是Math.St的别名。渐近,贝叶斯推论,决策理论,估计,基础,推论,检验。
--

---
英文摘要:
  We study the efficiency of V-fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call ``V-fold penalization''. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it ``overpenalizes'' all the more that V is large. Hence, asymptotic optimality requires V to go to infinity. However, when the signal-to-noise ratio is low, it appears that overpenalizing is necessary, so that the optimal V is not always the larger one, despite of the variability issue. This is confirmed by some simulated data. In order to improve on the prediction performance of VFCV, we define a new model selection procedure, called ``V-fold penalization'' (penVF). It is a V-fold subsampling version of Efron's bootstrap penalties, so that it has the same computational cost as VFCV, while being more flexible. In a heteroscedastic regression framework, assuming the models to have a particular structure, we prove that penVF satisfies a non-asymptotic oracle inequality with a leading constant that tends to 1 when the sample size goes to infinity. In particular, this implies adaptivity to the smoothness of the regression function, even with a highly heteroscedastic noise. Moreover, it is easy to overpenalize with penVF, independently from the V parameter. A simulation study shows that this results in a significant improvement on VFCV in non-asymptotic situations.
---
PDF链接:
https://arxiv.org/pdf/802.0566
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群