摘要翻译:
本文利用$\ell_1-\ell_q$正则化($1\leqq\leq\infty$)研究了高维回归中的分组变量选择问题,它可以看作是$\ell_1-\ell_2$正则化(群Lasso)的自然推广。关键条件是维数$P_n$可以比样本量$N$增长得快得多,即$P_n\gg N$(在我们的例子中$P_n$是群的数目),但相关群的数目很少。主要结论是,即使每组中的变量数也随着样本量的增加而增加,从$\ell_1-$正则化(Lasso)中得到的许多好的性质也自然地延续到$\ell_1-\ell_q$情况($1\leqq\leq\infty$)中。在固定设计下,我们证明了在不同的条件下,整个估计族既是估计一致的,又是变量选择一致的。在较弱的条件下,给出了随机设计的持久化结果。这些结果为从$q=1$(Lasso)到$q=\infty$(iCAP)的整个估计族提供了统一的处理,其中$q=2$(Lasso)是一个特例。当没有可用的群结构时,所有的分析都归结为Lasso估计器的当前结果($q=1$)。
---
英文标题:
《On the $\ell_1-\ell_q$ Regularized Regression》
---
作者:
Han Liu, Jian Zhang
---
最新提交年份:
2008
---
分类信息:
一级分类:Statistics 统计学
二级分类:Machine Learning
机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Mathematics 数学
二级分类:Statistics Theory 统计理论
分类描述:Applied, computational and theoretical statistics: e.g. statistical inference, regression, time series, multivariate analysis, data analysis, Markov chain Monte Carlo, design of experiments, case studies
应用统计、计算统计和理论统计:例如统计推断、回归、时间序列、多元分析、
数据分析、马尔可夫链蒙特卡罗、实验设计、案例研究
--
一级分类:Statistics 统计学
二级分类:Statistics Theory 统计理论
分类描述:stat.TH is an alias for math.ST. Asymptotics, Bayesian Inference, Decision Theory, Estimation, Foundations, Inference, Testing.
Stat.Th是Math.St的别名。渐近,贝叶斯推论,决策理论,估计,基础,推论,检验。
--
---
英文摘要:
In this paper we consider the problem of grouped variable selection in high-dimensional regression using $\ell_1-\ell_q$ regularization ($1\leq q \leq \infty$), which can be viewed as a natural generalization of the $\ell_1-\ell_2$ regularization (the group Lasso). The key condition is that the dimensionality $p_n$ can increase much faster than the sample size $n$, i.e. $p_n \gg n$ (in our case $p_n$ is the number of groups), but the number of relevant groups is small. The main conclusion is that many good properties from $\ell_1-$regularization (Lasso) naturally carry on to the $\ell_1-\ell_q$ cases ($1 \leq q \leq \infty$), even if the number of variables within each group also increases with the sample size. With fixed design, we show that the whole family of estimators are both estimation consistent and variable selection consistent under different conditions. We also show the persistency result with random design under a much weaker condition. These results provide a unified treatment for the whole family of estimators ranging from $q=1$ (Lasso) to $q=\infty$ (iCAP), with $q=2$ (group Lasso)as a special case. When there is no group structure available, all the analysis reduces to the current results of the Lasso estimator ($q=1$).
---
PDF链接:
https://arxiv.org/pdf/802.1517