摘要翻译:
本文研究有限字母表上离散序列的变阶马尔可夫预测算法。这种算法的类别很大,原则上包括任何无损压缩算法。重点介绍了六种常用的预测算法,包括上下文树加权(CTW)、部分匹配预测(PPM)和概率后缀树(PSTs)。我们讨论了这些算法的性质,并用三个领域的真实生命序列:蛋白质、英语文本和音乐片段来比较它们的性能。用平均对数损失来衡量预测质量,并进行了比较。我们还比较了基于这些预测因子的分类算法在一些大型蛋白质分类任务中的应用。我们的结果表明,“分解”CTW(一种CTW算法的变体)和PPM在序列预测任务中优于所有其他算法。有些令人惊讶的是,一种不同的算法,它是对Lempel-Ziv压缩算法的改进,在蛋白质分类问题上明显优于所有算法。
---
英文标题:
《On Prediction Using Variable Order Markov Models》
---
作者:
R. Begleiter, R. El-Yaniv, G. Yona
---
最新提交年份:
2011
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a "decomposed" CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems.
---
PDF链接:
https://arxiv.org/pdf/1107.0051