改进的V-fold交叉验证：V-fold惩罚

635

收藏 2022-03-26

摘要翻译：
本文从非渐近的观点出发，研究了V-折叠交叉验证(VFCV)在模型选择中的有效性，并对其进行了改进，称之为“V-折叠惩罚”。考虑一个特殊的（虽然简单的）回归问题，我们证明了V有界的VFCV对于模型选择是次优的，因为V越大，它就越“过分惩罚”。因此，渐近最优性要求V达到无穷大。然而，当信噪比较低时，似乎需要过度惩罚，因此，尽管存在可变性问题，但最佳V并不总是较大的V。一些模拟数据证实了这一点。为了提高VFCV的预测性能，我们定义了一种新的模型选择过程，称为“V-折叠惩罚”(penVF)。它是Efron的bootstrap惩罚的V倍子抽样版本，因此它具有与VFCV相同的计算代价，同时更加灵活。在异方差回归框架下，假设模型具有特定的结构，证明了当样本容量为无穷大时，penVF满足一个导常数趋于1的非渐近oracle不等式。特别是，这意味着对回归函数平滑性的适应性，即使在高度异方差噪声下也是如此。此外，与V参数无关，使用penVF很容易过度惩罚。仿真研究表明，该方法在非渐近情况下对VFCV有明显的改善。
---
英文标题：
《V-fold cross-validation improved: V-fold penalization》
---
作者：
Sylvain Arlot (LM-Orsay, INRIA Futurs)
---
最新提交年份：
2008
---
分类信息：

一级分类：Mathematics 数学
二级分类：Statistics Theory 统计理论
分类描述：Applied, computational and theoretical statistics: e.g. statistical inference, regression, time series, multivariate analysis, data analysis, Markov chain Monte Carlo, design of experiments, case studies
应用统计、计算统计和理论统计：例如统计推断、回归、时间序列、多元分析、数据分析、马尔可夫链蒙特卡罗、实验设计、案例研究
--
一级分类：Statistics 统计学
二级分类：Machine Learning 机器学习
分类描述：Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文（监督，无监督，半监督学习，图形模型，强化学习，强盗，高维推理等）与统计或理论基础
--
一级分类：Statistics 统计学
二级分类：Statistics Theory 统计理论
分类描述：stat.TH is an alias for math.ST. Asymptotics, Bayesian Inference, Decision Theory, Estimation, Foundations, Inference, Testing.
Stat.Th是Math.St的别名。渐近，贝叶斯推论，决策理论，估计，基础，推论，检验。
--

---
英文摘要：
We study the efficiency of V-fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call ``V-fold penalization''. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it ``overpenalizes'' all the more that V is large. Hence, asymptotic optimality requires V to go to infinity. However, when the signal-to-noise ratio is low, it appears that overpenalizing is necessary, so that the optimal V is not always the larger one, despite of the variability issue. This is confirmed by some simulated data. In order to improve on the prediction performance of VFCV, we define a new model selection procedure, called ``V-fold penalization'' (penVF). It is a V-fold subsampling version of Efron's bootstrap penalties, so that it has the same computational cost as VFCV, while being more flexible. In a heteroscedastic regression framework, assuming the models to have a particular structure, we prove that penVF satisfies a non-asymptotic oracle inequality with a leading constant that tends to 1 when the sample size goes to infinity. In particular, this implies adaptivity to the smoothness of the regression function, even with a highly heteroscedastic noise. Moreover, it is easy to overpenalize with penVF, independently from the V parameter. A simulation study shows that this results in a significant improvement on VFCV in non-asymptotic situations.
---
PDF链接：
https://arxiv.org/pdf/802.0566

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群