一阶MDPs的实用线性值逼近技术

577

收藏 2022-03-12

摘要翻译：
一阶马尔可夫决策过程的近似线性规划(ALP)技术的最新工作是线性地表示值函数。一组一阶基函数，并使用线性规划技术来确定合适的权重。这种方法的优点是它不需要简化一阶值函数，并且允许人们独立于特定领域实例化来求解FOMDPs。在本文中，我们解决了几个问题，以增强本文的适用性：（1）我们能否将一阶ALP框架扩展到近似策略迭代，以解决现有方法的性能不足？（2）能否自动生成基函数并评估其对价值函数质量的影响？（3）我们如何将具有普遍量化奖励的棘手问题分解为可处理的子问题？我们提出了这些问题的答案以及一些新的优化，并从ICAPS 2004概率规划竞赛中提供了一个关于物流问题的比较实证评估。
---
英文标题：
《Practical Linear Value-approximation Techniques for First-order MDPs》
---
作者：
Scott Sanner, Craig Boutilier
---
最新提交年份：
2012
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要：
Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantage that it does not require simplification of the first-order value function, and allows one to solve FOMDPs independent of a specific domain instantiation. In this paper, we address several questions to enhance the applicability of this work: (1) Can we extend the first-order ALP framework to approximate policy iteration to address performance deficiencies of previous approaches? (2) Can we automatically generate basis functions and evaluate their impact on value function quality? (3) How can we decompose intractable problems with universally quantified rewards into tractable subproblems? We propose answers to these questions along with a number of novel optimizations and provide a comparative empirical evaluation on logistics problems from the ICAPS 2004 Probabilistic Planning Competition.
---
PDF链接：
https://arxiv.org/pdf/1206.6879

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群