限制值迭代：理论与算法

531

收藏 2022-03-21

摘要翻译：
值迭代法是求解POMDPS最优策略的一种常用算法。由于需要考虑整个置信空间，因此需要求解大量的线性规划，因此效率较低。本文研究了受限于置信子集的值迭代。我们证明了在适当选择信念子集的情况下，限制值迭代产生近似最优策略，并给出了确定给定信念子集是否节省空间和时间的条件。我们还将限制值迭代应用于两类有趣的POMDPs，即信息POMDPs和近可分辨POMDPs。
---
英文标题：
《Restricted Value Iteration: Theory and Algorithms》
---
作者：
N. L. Zhang, W. Zhang
---
最新提交年份：
2011
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要：
Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-optimal policies and we give a condition for determining whether a given belief subset would bring about savings in space and time. We also apply restricted value iteration to two interesting classes of POMDPs, namely informative POMDPs and near-discernible POMDPs.
---
PDF链接：
https://arxiv.org/pdf/1107.0042

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群