具有形式学习时间保证的增量式基于模型的学习者

397

收藏 2022-03-17

摘要翻译：
基于模型的学习算法在求解具有有限状态空间和动作空间的马尔可夫决策过程时可以有效地利用经验。然而，由于反复求解一个内部模型，其计算代价很高，限制了其在大规模问题中的应用。我们提出了一种基于实时动态规划(RTDP)的方法来加快两种基于模型的算法RMAX和MBIE（model-based interval estimation）的速度，使算法在计算上比现有的边界损失小得多。具体来说，我们的两个新的学习算法，RTDP-RMAX和RTDP-IE的计算量比RMAX和MBIE小得多。我们发展了一个通用的理论框架，允许我们证明在PAC（可能大致正确）意义上，两者都是有效的学习者。我们也给出了这些新算法的实验评估，有助于量化计算和经验需求之间的折衷。
---
英文标题：
《Incremental Model-based Learners With Formal Learning-Time Guarantees》
---
作者：
Alexander L. Strehl, Lihong Li, Michael L. Littman
---
最新提交年份：
2012
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Machine Learning 机器学习
分类描述：Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文（有监督的，无监督的，强化学习，强盗问题，等等），包括健壮性，解释性，公平性和方法论。对于机器学习方法的应用，CS.LG也是一个合适的主要类别。
--
一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类：Statistics 统计学
二级分类：Machine Learning 机器学习
分类描述：Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文（监督，无监督，半监督学习，图形模型，强化学习，强盗，高维推理等）与统计或理论基础
--

---
英文摘要：
Model-based learning algorithms have been shown to use experience efficiently when learning to solve Markov Decision Processes (MDPs) with finite state and action spaces. However, their high computational cost due to repeatedly solving an internal model inhibits their use in large-scale problems. We propose a method based on real-time dynamic programming (RTDP) to speed up two model-based algorithms, RMAX and MBIE (model-based interval estimation), resulting in computationally much faster algorithms with little loss compared to existing bounds. Specifically, our two new learning algorithms, RTDP-RMAX and RTDP-IE, have considerably smaller computational demands than RMAX and MBIE. We develop a general theoretical framework that allows us to prove that both are efficient learners in a PAC (probably approximately correct) sense. We also present an experimental evaluation of these new algorithms that helps quantify the tradeoff between computational and experience demands.
---
PDF链接：
https://arxiv.org/pdf/1206.6870

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群