摘要翻译:
我们考虑了从实例具有部分重叠的源中联合训练结构化模型的问题。这有重要的应用,如用户驱动的Web自组织信息提取。这类应用程序在来源的数量及其任意重叠模式方面提出了新的挑战,这是以前在两个来源上应用的集体培训计划所没有的。我们提出了一个基于协议的学习框架,并在该框架中对可处理性、对噪声的鲁棒性和协议程度进行了权衡。我们提供了一个原则性的方案来发现跨源的无标记数据中的低噪声协议集。通过对58个真实数据集的广泛实验,我们证实了我们的方法在最大的文本片段上额外奖励一致提供了最佳的权衡,并且在集体推理、分阶段训练和多视图学习等替代方案上也获得了分数。
---
英文标题:
《Joint Structured Models for Extraction from Overlapping Sources》
---
作者:
Rahul Gupta, Sunita Sarawagi
---
最新提交年份:
2010
---
分类信息:
一级分类:Computer Science        计算机科学
二级分类:Artificial Intelligence        
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
  We consider the problem of jointly training structured models for extraction from sources whose instances enjoy partial overlap. This has important applications like user-driven ad-hoc information extraction on the web. Such applications present new challenges in terms of the number of sources and their arbitrary pattern of overlap not seen by earlier collective training schemes applied on two sources. We present an agreement-based learning framework and alternatives within it to trade-off tractability, robustness to noise, and extent of agreement. We provide a principled scheme to discover low-noise agreement sets in unlabeled data across the sources. Through extensive experiments over 58 real datasets, we establish that our method of additively rewarding agreement over maximal segments of text provides the best trade-offs, and also scores over alternatives such as collective inference, staged training, and multi-view learning. 
---
PDF链接:
https://arxiv.org/pdf/1005.0104