摘要翻译:
我们提出并比较了用于主动学习的各种句子选择策略,用于检测实体提及的任务。最佳策略使用两个统计分类器在数据的不同视图上训练的可信度之和。实验结果表明,与随机选择策略相比,该策略在获得相同性能的同时,减少了50%以上的标记训练数据量。当只考虑命名提及时,效果更加显著:系统仅使用随机选择策略所需训练数据的42%就达到了相同的性能。
---
英文标题:
《Active Learning for Mention Detection: A Comparison of Sentence
Selection Strategies》
---
作者:
Nitin Madnani, Hongyan Jing, Nanda Kambhatla and Salim Roukos
---
最新提交年份:
2009
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Computation and Language 计算与语言
分类描述:Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.
涵盖自然语言处理。大致包括ACM科目I.2.7类的材料。请注意,人工语言(编程语言、逻辑学、形式系统)的工作,如果没有明确地解决广义的自然语言问题(自然语言处理、计算语言学、语音、文本检索等),就不适合这个领域。
--
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
We propose and compare various sentence selection strategies for active learning for the task of detecting mentions of entities. The best strategy employs the sum of confidences of two statistical classifiers trained on different views of the data. Our experimental results show that, compared to the random selection strategy, this strategy reduces the amount of required labeled training data by over 50% while achieving the same performance. The effect is even more significant when only named mentions are considered: the system achieves the same performance by using only 42% of the training data required by the random selection strategy.
---
PDF链接:
https://arxiv.org/pdf/0911.1965