用广泛数据分离种群：一种谱分析

239

收藏 2022-03-05

摘要翻译：
在本文中，我们考虑了从$k$product分布的混合物中提取的小数据样本的划分问题。我们感兴趣的是单个特征的平均质量低$\gamma$，我们希望尽可能少地使用它们来正确地划分样本。我们分析了一种光谱技术，它能够近似优化总数据大小--正确执行此分区所需的数据点数$n$和特征数$k$的乘积--作为$1/\γ$for$k>n$的函数。当任意两个种群之间的差异很小时，我们的目标是通过使用标记根据其起源种群对个体进行聚类的应用。
---
英文标题：
《Separating populations with wide data: A spectral analysis》
---
作者：
Avrim Blum, Amin Coja-Oghlan, Alan Frieze, Shuheng Zhou
---
最新提交年份：
2009
---
分类信息：

一级分类：Statistics 统计学
二级分类：Machine Learning 机器学习
分类描述：Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文（监督，无监督，半监督学习，图形模型，强化学习，强盗，高维推理等）与统计或理论基础
--
一级分类：Statistics 统计学
二级分类：Applications 应用程序
分类描述：Biology, Education, Epidemiology, Engineering, Environmental Sciences, Medical, Physical Sciences, Quality Control, Social Sciences
生物学，教育学，流行病学，工程学，环境科学，医学，物理科学，质量控制，社会科学
--

---
英文摘要：
In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of $k$ product distributions. We are interested in the case that individual features are of low average quality $\gamma$, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size--the product of number of data points $n$ and the number of features $K$--needed to correctly perform this partitioning as a function of $1/\gamma$ for $K>n$. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.
---
PDF链接：
https://arxiv.org/pdf/706.3434

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群