摘要翻译:
归一化信息距离是各种对象的通用距离度量。它基于Kolmogorov复杂性,因此不可计算,但有方法可以利用它。首先,如果对象具有字符串表示,可以使用压缩算法来近似Kolmogorov复杂度。其次,对于名称和抽象概念,可以使用来自万维网的页面计数统计数据。这些归一化信息距离的实际实现可以应用于机器学习任务,期望聚类,以执行无特征和无参数的
数据挖掘。本章讨论了归一化信息距离的理论基础和实际实现。它展示了许多基于这些距离度量的成功现实应用的例子,从生物信息学到音乐聚类到机器翻译。
---
英文标题:
《Normalized Information Distance》
---
作者:
Paul M.B. Vitanyi (CWI and Univ. Amsterdam), Frank J. Balbach (Univ.
Waterloo), Rudi L. Cilibrasi (CWI), and Ming Li (Univ. Waterloo)
---
最新提交年份:
2008
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Information Retrieval 信息检索
分类描述:Covers indexing, dictionaries, retrieval, content and analysis. Roughly includes material in ACM Subject Classes H.3.0, H.3.1, H.3.2, H.3.3, and H.3.4.
涵盖索引,字典,检索,内容和分析。大致包括ACM主题课程H.3.0、H.3.1、H.3.2、H.3.3和H.3.4中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.
---
PDF链接:
https://arxiv.org/pdf/0809.2553