分解MDPs的高效求解算法

568

收藏 2022-03-15

摘要翻译：
本文研究了不确定条件下大型马尔可夫决策过程的规划问题。因子MDPs用状态变量表示复杂的状态空间，用动态贝叶斯网络表示转换模型。这种表示通常允许结构化MDP的表示大小呈指数级减小，但这种MDP的精确求解算法的复杂性可能会随着表示大小呈指数级增长。本文提出了两种基于结构的分解MDPS近似求解算法。两者都使用近似的值函数表示为基函数的线性组合，其中每个基函数只涉及域变量的一个小子集。本文的一个重要贡献是，通过在分解MDP中利用加法和上下文特定结构，展示了如何以封闭的形式高效地执行这两种算法的基本操作。我们算法的一个核心元素是一种新颖的线性规划分解技术，类似于贝叶斯网络中的变量消除，它将指数级大的LP简化为可证等价的多项式大小的LP。一种算法采用近似线性规划，第二种算法采用近似动态规划。我们的动态规划算法是新颖的，因为它使用了基于最大范数的近似，这种技术更直接地最小化了近似MDP算法中出现在误差界的项。我们对超过1040个状态的问题提供了实验结果，证明了我们的方法的可扩展性，并将我们的算法与现有的最先进的方法进行了比较，在某些问题上显示出计算时间的指数增益。
---
英文标题：
《Efficient Solution Algorithms for Factored MDPs》
---
作者：
C. Guestrin, D. Koller, R. Parr, S. Venkataraman
---
最新提交年份：
2011
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要：
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.
---
PDF链接：
https://arxiv.org/pdf/1106.1822

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群