摘要翻译:
相似度网络是许多信息管理应用中的重要抽象,如推荐系统、语料库分析、医学信息学等。例如,通过归纳用户评分相似的电影之间、包含共同术语的文档之间以及涉及相同主题的临床试验之间的相似性网络,我们可以旨在找到数据背后的联系的全局结构,并利用网络作为基础,在看似不同的实体之间建立联系。在上述应用中,感兴趣对象之间的相似性组合分别用于意外推荐、讲故事和临床诊断。本文提出了一种基于吊床路径的相似路径遍历算法框架。吊床路径是传统路径的推广。我们的框架本质上是探索性的,因此,给定感兴趣的起始和结束对象,它探索路径跟随的候选对象,并启发式地估计路径通向期望目的地的潜力。我们提出了三种不同的应用:在Netflix数据集中探索电影相似性,在PubMed语料库中探索抽象相似性,以及在临床试验数据库中探索描述相似性。实验结果证明了该方法在相似性网络中非结构化知识发现方面的潜力。
---
英文标题:
《Efficiently Discovering Hammock Paths from Induced Similarity Networks》
---
作者:
M. Shahriar Hossain, Michael Narayan and Naren Ramakrishnan
---
最新提交年份:
2010
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Databases 数据库
分类描述:Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.
涵盖数据库管理、
数据挖掘和数据处理。大致包括ACM学科类E.2、E.5、H.0、H.2和J.1中的材料。
--
---
英文摘要:
Similarity networks are important abstractions in many information management applications such as recommender systems, corpora analysis, and medical informatics. For instance, by inducing similarity networks between movies rated similarly by users, or between documents containing common terms, and or between clinical trials involving the same themes, we can aim to find the global structure of connectivities underlying the data, and use the network as a basis to make connections between seemingly disparate entities. In the above applications, composing similarities between objects of interest finds uses in serendipitous recommendation, in storytelling, and in clinical diagnosis, respectively. We present an algorithmic framework for traversing similarity paths using the notion of `hammock' paths which are generalization of traditional paths. Our framework is exploratory in nature so that, given starting and ending objects of interest, it explores candidate objects for path following, and heuristics to admissibly estimate the potential for paths to lead to a desired destination. We present three diverse applications: exploring movie similarities in the Netflix dataset, exploring abstract similarities across the PubMed corpus, and exploring description similarities in a database of clinical trials. Experimental results demonstrate the potential of our approach for unstructured knowledge discovery in similarity networks.
---
PDF链接:
https://arxiv.org/pdf/1002.3195