全部版块 我的主页
论坛 经济学人 二区 外文文献专区
203 0
2022-03-05
摘要翻译:
在本文中,我们考虑了从$k$product分布的混合物中提取的小数据样本的划分问题。我们感兴趣的是单个特征的平均质量低$\gamma$,我们希望尽可能少地使用它们来正确地划分样本。我们分析了一种光谱技术,它能够近似优化总数据大小--正确执行此分区所需的数据点数$n$和特征数$k$的乘积--作为$1/\γ$for$k>n$的函数。当任意两个种群之间的差异很小时,我们的目标是通过使用标记根据其起源种群对个体进行聚类的应用。
---
英文标题:
《Separating populations with wide data: A spectral analysis》
---
作者:
Avrim Blum, Amin Coja-Oghlan, Alan Frieze, Shuheng Zhou
---
最新提交年份:
2009
---
分类信息:

一级分类:Statistics        统计学
二级分类:Machine Learning        机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Statistics        统计学
二级分类:Applications        应用程序
分类描述:Biology, Education, Epidemiology, Engineering, Environmental Sciences, Medical, Physical Sciences, Quality Control, Social Sciences
生物学,教育学,流行病学,工程学,环境科学,医学,物理科学,质量控制,社会科学
--

---
英文摘要:
  In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of $k$ product distributions. We are interested in the case that individual features are of low average quality $\gamma$, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size--the product of number of data points $n$ and the number of features $K$--needed to correctly perform this partitioning as a function of $1/\gamma$ for $K>n$. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.
---
PDF链接:
https://arxiv.org/pdf/706.3434
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群