摘要翻译:
在本文中,我们考虑了从$k$product分布的混合物中提取的小数据样本的划分问题。我们感兴趣的是单个特征的平均质量低$\gamma$,我们希望尽可能少地使用它们来正确地划分样本。我们分析了一种光谱技术,它能够近似优化总数据大小--正确执行此分区所需的数据点数$n$和特征数$k$的乘积--作为$1/\γ$for$k>n$的函数。当任意两个种群之间的差异很小时,我们的目标是通过使用标记根据其起源种群对个体进行聚类的应用。
---
英文标题:
《Separating populations with wide data: A spectral analysis》
---
作者:
Avrim Blum, Amin Coja-Oghlan, Alan Frieze, Shuheng Zhou
---
最新提交年份:
2009
---
分类信息:
一级分类:Statistics 统计学
二级分类:Machine Learning
机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Statistics 统计学
二级分类:Applications 应用程序
分类描述:Biology, Education, Epidemiology, Engineering, Environmental Sciences, Medical, Physical Sciences, Quality Control, Social Sciences
生物学,教育学,流行病学,工程学,环境科学,医学,物理科学,质量控制,社会科学
--
---
英文摘要:
In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of $k$ product distributions. We are interested in the case that individual features are of low average quality $\gamma$, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size--the product of number of data points $n$ and the number of features $K$--needed to correctly perform this partitioning as a function of $1/\gamma$ for $K>n$. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.
---
PDF链接:
https://arxiv.org/pdf/706.3434