一种从结构化数据中学习大众分类法的概率方法

333

收藏 2022-03-09

摘要翻译：
学习结构化表示已经成为许多领域的一个重要问题，包括文档和Web数据挖掘、生物信息学和图像分析。学习复杂结构的一种方法是集成许多较小的、不完整的和有噪声的结构片段。在这项工作中，我们提出了一种无监督的概率方法，它扩展了亲和力传播，将小的本体片段组合成一个集成的、一致的、更大的大众分类学的集合。这是一个具有挑战性的任务，因为该方法必须聚集相似的结构，同时避免结构不一致和处理噪声。我们在一个真实世界的社交媒体数据集上验证了该方法，该数据集由许多个人用户指定的浅层个人层次结构组成，收集自photosharing网站Flickr。我们的实验结果表明，与仅使用标准亲和传播算法的方法相比，我们提出的方法能够构造更深和更密集的结构。此外，与基于增量关系聚类的最先进方法相比，该方法获得了更好的整体集成质量。
---
英文标题：
《A Probabilistic Approach for Learning Folksonomies from Structured Data》
---
作者：
Anon Plangprasopchok, Kristina Lerman, Lise Getoor
---
最新提交年份：
2010
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类：Computer Science 计算机科学
二级分类：Computers and Society 计算机与社会
分类描述：Covers impact of computers on society, computer ethics, information technology and public policy, legal aspects of computing, computers and education. Roughly includes material in ACM Subject Classes K.0, K.2, K.3, K.4, K.5, and K.7.
涵盖计算机对社会的影响、计算机伦理、信息技术和公共政策、计算机的法律方面、计算机和教育。大致包括ACM学科类K.0、K.2、K.3、K.4、K.5和K.7中的材料。
--
一级分类：Computer Science 计算机科学
二级分类：Machine Learning 机器学习
分类描述：Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文（有监督的，无监督的，强化学习，强盗问题，等等），包括健壮性，解释性，公平性和方法论。对于机器学习方法的应用，CS.LG也是一个合适的主要类别。
--

---
英文摘要：
Learning structured representations has emerged as an important problem in many domains, including document and Web data mining, bioinformatics, and image analysis. One approach to learning complex structures is to integrate many smaller, incomplete and noisy structure fragments. In this work, we present an unsupervised probabilistic approach that extends affinity propagation to combine the small ontological fragments into a collection of integrated, consistent, and larger folksonomies. This is a challenging task because the method must aggregate similar structures while avoiding structural inconsistencies and handling noise. We validate the approach on a real-world social media dataset, comprised of shallow personal hierarchies specified by many individual users, collected from the photosharing website Flickr. Our empirical results show that our proposed approach is able to construct deeper and denser structures, compared to an approach using only the standard affinity propagation algorithm. Additionally, the approach yields better overall integration quality than a state-of-the-art approach based on incremental relational clustering.
---
PDF链接：
https://arxiv.org/pdf/1011.3557

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群