全部版块 我的主页
论坛 经济学人 二区 外文文献专区
560 0
2022-03-12
摘要翻译:
研究了一种大型关系马尔可夫决策过程的策略选择方法。我们考虑了近似策略迭代(API)的一种变体,它用策略空间中的一个学习步骤来代替通常的值函数学习步骤。在好的策略比相应的值函数更容易表示和学习的领域中,这是有利的,我们感兴趣的关系MDP通常就是这种情况。为了将API应用于此类问题,我们引入了一种关系策略语言和相应的学习者。此外,我们提出了一种新的基于随机游动的目标规划域自举程序。对于许多大型关系MDP来说,这种引导是必要的,在这些MDP中,奖励非常稀少,因为API在这样的域中使用不知情的策略初始化时是无效的。我们的实验表明,该系统能够通过将一些经典规划域及其随机变量解为非常大的关系MDPs来找到好的策略。实验还指出了我们方法的一些局限性,并对未来的工作提出了建议。
---
英文标题:
《Approximate Policy Iteration with a Policy Language Bias: Solving
  Relational Markov Decision Processes》
---
作者:
A. Fern, R. Givan, S. Yoon
---
最新提交年份:
2011
---
分类信息:

一级分类:Computer Science        计算机科学
二级分类:Artificial Intelligence        人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要:
  We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.
---
PDF链接:
https://arxiv.org/pdf/1109.2156
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群