基于任意时间点的大型POMDPs近似

471

收藏 2022-03-15

摘要翻译：
部分可观测马尔可夫决策过程长期以来一直被认为是现实世界规划和控制问题的丰富框架，尤其是在机器人领域。然而，除了最小的问题外，该框架中的精确解通常在计算上很困难。一个众所周知的加速POMDP求解的技术包括在特定的信念点执行值备份，而不是在整个信念单形上执行值备份。然而，这种方法的效率在很大程度上取决于点的选择。本文提出了一套新的选择信息信念点的方法，并在实践中取得了良好的效果。将点选择过程与基于点的值备份相结合，形成了一种有效的随时POMDP算法，称为基于点的值迭代(PBVI)。本文首先介绍了该算法，并对信念选择技术的选择进行了理论分析。本文的第二个目的是将PBVI算法与其他最先进的POMDP算法，特别是Perseus算法进行深入的实证比较，以突出它们的异同。使用标准POMDP域和实际机器人任务进行评估。
---
英文标题：
《Anytime Point-Based Approximations for Large POMDPs》
---
作者：
J. Pineau, G. Gordon, S. Thrun
---
最新提交年份：
2011
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要：
The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.
---
PDF链接：
https://arxiv.org/pdf/1110.0027

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群