全部版块 我的主页
论坛 经济学人 二区 外文文献专区
628 0
2022-03-08
摘要翻译:
设计特征选择学习算法的目标之一是获得依赖于少量属性并具有可验证的未来性能保证的分类器。能够同时成功地解决这两个目标的方法很少,如果有的话。性能保证对于微阵列数据分析等任务至关重要,因为样本量非常小,导致经验评估有限。据我们所知,迄今为止,在基因表达数据分类的背景下,还没有提出这样的算法来给出未来性能的理论界限。在这项工作中,我们研究了在Occam's Razor、样本压缩和PAC-Bayes学习设置中学习决策树桩的合取(或析取)的前提,以识别可用于执行可靠分类任务的一小部分属性。我们将所提出的方法应用于DNA微阵列数据的基因鉴定,并将我们的结果与已知的成功方法进行比较。结果表明,与其他方法不同,我们的算法不仅可以找到基因数目少得多的假设,同时具有竞争性的分类精度,而且对未来的性能具有严格的风险保证。无论从新算法的设计还是从其他领域的应用来看,所提出的方法都具有通用性和可扩展性。
---
英文标题:
《Feature Selection with Conjunctions of Decision Stumps and Learning from
  Microarray Data》
---
作者:
Mohak Shah, Mario Marchand and Jacques Corbeil
---
最新提交年份:
2010
---
分类信息:

一级分类:Computer Science        计算机科学
二级分类:Machine Learning        机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Computer Science        计算机科学
二级分类:Artificial Intelligence        人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Statistics        统计学
二级分类:Machine Learning        机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--

---
英文摘要:
  One of the objectives of designing feature selection learning algorithms is to obtain classifiers that depend on a small number of attributes and have verifiable future performance guarantees. There are few, if any, approaches that successfully address the two goals simultaneously. Performance guarantees become crucial for tasks such as microarray data analysis due to very small sample sizes resulting in limited empirical evaluation. To the best of our knowledge, such algorithms that give theoretical bounds on the future performance have not been proposed so far in the context of the classification of gene expression data. In this work, we investigate the premise of learning a conjunction (or disjunction) of decision stumps in Occam's Razor, Sample Compression, and PAC-Bayes learning settings for identifying a small subset of attributes that can be used to perform reliable classification tasks. We apply the proposed approaches for gene identification from DNA microarray data and compare our results to those of well known successful approaches proposed for the task. We show that our algorithm not only finds hypotheses with much smaller number of genes while giving competitive classification accuracy but also have tight risk guarantees on future performance unlike other approaches. The proposed approaches are general and extensible in terms of both designing novel algorithms and application to other domains.
---
PDF链接:
https://arxiv.org/pdf/1005.0530
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群