摘要翻译:
惩罚程序往往依赖于乘数,乘数的最优值要么未知,要么难以从数据中估计。在最小二乘回归框架中,我们提出了一个完全由数据驱动的参数校正算法,而不假设惩罚的特定形状。我们的算法依赖于最近由Birge和Massart(2007)在高斯同方差回归的惩罚最小二乘上下文中引入的最小惩罚的概念。在积极的一面,最小惩罚可以从数据本身评估,导致一个数据驱动的最优惩罚估计,可以在实践中使用;在消极的一面,他们的方法严重依赖于他们的随机框架的同方差高斯性质。本文的目的有两个:提出了一个设计数据驱动惩罚的更一般的启发式(斜率启发式),并证明了它适用于随机设计的惩罚最小二乘回归,甚至适用于异方差非高斯数据。由于技术上的原因,一些精确的数学结果将仅用于回归图仓宽的选择。这至少是取得进一步结果的第一步,因为我们所使用的方法和途径确实是一般性的。
---
英文标题:
《Data-driven calibration of penalties for least-squares regression》
---
作者:
Sylvain Arlot (LM-Orsay, INRIA Futurs), Pascal Massart (LM-Orsay,
INRIA Futurs)
---
最新提交年份:
2008
---
分类信息:
一级分类:Mathematics 数学
二级分类:Statistics Theory 统计理论
分类描述:Applied, computational and theoretical statistics: e.g. statistical inference, regression, time series, multivariate analysis, data analysis, Markov chain Monte Carlo, design of experiments, case studies
应用统计、计算统计和理论统计:例如统计推断、回归、时间序列、多元分析、
数据分析、马尔可夫链蒙特卡罗、实验设计、案例研究
--
一级分类:Statistics 统计学
二级分类:Methodology 方法论
分类描述:Design, Surveys, Model Selection, Multiple Testing, Multivariate Methods, Signal and Image Processing, Time Series, Smoothing, Spatial Statistics, Survival Analysis, Nonparametric and Semiparametric Methods
设计,调查,模型选择,多重检验,多元方法,信号和图像处理,时间序列,平滑,空间统计,生存分析,非参数和半参数方法
--
一级分类:Statistics 统计学
二级分类:Statistics Theory 统计理论
分类描述:stat.TH is an alias for math.ST. Asymptotics, Bayesian Inference, Decision Theory, Estimation, Foundations, Inference, Testing.
Stat.Th是Math.St的别名。渐近,贝叶斯推论,决策理论,估计,基础,推论,检验。
--
---
英文摘要:
Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from the data. We propose a completely data-driven calibration algorithm for this parameter in the least-squares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birge and Massart (2007) in the context of penalized least squares for Gaussian homoscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a data-driven estimation of an optimal penalty which can be used in practice; on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework. The purpose of this paper is twofold: stating a more general heuristics for designing a data-driven penalty (the slope heuristics) and proving that it works for penalized least-squares regression with a random design, even for heteroscedastic non-Gaussian data. For technical reasons, some exact mathematical results will be proved only for regressogram bin-width selection. This is at least a first step towards further results, since the approach and the method that we use are indeed general.
---
PDF链接:
https://arxiv.org/pdf/802.0837