全部版块 我的主页
论坛 经济学人 二区 外文文献专区
236 0
2022-03-25
摘要翻译:
序列决策问题往往可以通过模拟未来可能的行动序列近似求解。Metalevel决策程序已经被开发出来,用于根据任何特定的模拟所产生的决策质量的预期改进来选择要模拟的{\em}动作序列;一个例子是最近在围棋比赛中使用bandit算法控制蒙特卡罗树搜索的工作。在本文中,我们在贝叶斯{\em选择问题}的统计框架中发展了一个metalevel决策的理论基础,并论证了(正如其他人所做的那样)这比bandit框架更合适。我们得到了一些适用于蒙特卡罗选择问题的基本结果,包括在某些情况下最优策略的第一有限抽样界;我们还提供了一个简单的反例来证明直觉猜想,即在所有情况下,最优策略必然会得到一个决定。然后,我们在贝叶斯和无分布环境下推导了启发式近似,并证明了它们在一次决策问题和围棋中优于基于Bandit的启发式。
---
英文标题:
《Selecting Computations: Theory and Applications》
---
作者:
Nicholas Hay and Stuart Russell, David Tolpin and Solomon Eyal Shimony
---
最新提交年份:
2012
---
分类信息:

一级分类:Computer Science        计算机科学
二级分类:Artificial Intelligence        人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要:
  Sequential decision problems are often approximately solvable by simulating possible future action sequences. {\em Metalevel} decision procedures have been developed for selecting {\em which} action sequences to simulate, based on estimating the expected improvement in decision quality that would result from any particular simulation; an example is the recent work on using bandit algorithms to control Monte Carlo tree search in the game of Go. In this paper we develop a theoretical basis for metalevel decisions in the statistical framework of Bayesian {\em selection problems}, arguing (as others have done) that this is more appropriate than the bandit framework. We derive a number of basic results applicable to Monte Carlo selection problems, including the first finite sampling bounds for optimal policies in certain cases; we also provide a simple counterexample to the intuitive conjecture that an optimal policy will necessarily reach a decision in all cases. We then derive heuristic approximations in both Bayesian and distribution-free settings and demonstrate their superiority to bandit-based heuristics in one-shot decision problems and in Go.
---
PDF链接:
https://arxiv.org/pdf/1207.5879
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群