用最优选择模型进行组合计划

434

收藏 2022-03-15

摘要翻译：
本文介绍了一个期权模型组合的框架。期权模型是时间抽象，就像经典规划中的宏运算符一样，直接从开始状态跳到结束状态。以往的工作主要集中在通过期权模型内学习从原始行为构造期权模型；或者使用期权模型来构造价值函数，通过期权间规划。基于Bellman方程的一个主要推广，我们提出了一个统一的观点，即内部和之间的选择模型学习。我们的基本操作是将期权模型递归组合到其他期权模型中。这一关键思想支持在多个抽象层次上进行组合规划。我们使用一个动态规划算法来说明我们的框架，该算法同时为多个子目标构建最优期权模型，并在这些期权模型上搜索以提供向其他子目标的快速进展。
---
英文标题：
《Compositional Planning Using Optimal Option Models》
---
作者：
David Silver (University College London), Kamil Ciosek (University
College London)
---
最新提交年份：
2012
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类：Computer Science 计算机科学
二级分类：Machine Learning 机器学习
分类描述：Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文（有监督的，无监督的，强化学习，强盗问题，等等），包括健壮性，解释性，公平性和方法论。对于机器学习方法的应用，CS.LG也是一个合适的主要类别。
--

---
英文摘要：
In this paper we introduce a framework for option model composition. Option models are temporal abstractions that, like macro-operators in classical planning, jump directly from a start state to an end state. Prior work has focused on constructing option models from primitive actions, by intra-option model learning; or on using option models to construct a value function, by inter-option planning. We present a unified view of intra- and inter-option model learning, based on a major generalisation of the Bellman equation. Our fundamental operation is the recursive composition of option models into other option models. This key idea enables compositional planning over many levels of abstraction. We illustrate our framework using a dynamic programming algorithm that simultaneously constructs optimal option models for multiple subgoals, and also searches over those option models to provide rapid progress towards other subgoals.
---
PDF链接：
https://arxiv.org/pdf/1206.6473

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群