学习有效协调：一种基于模型的方法

272

收藏 2022-03-08

摘要翻译：
在共同利益随机博弈中，所有参与者都得到相同的收益。参与这类游戏的玩家必须学会相互协调，以获得尽可能高的价值。针对这一问题，人们提出了许多强化学习算法，其中一些算法已被证明能在极限范围内收敛到很好的解。在本文中，我们证明了使用非常简单的基于模型的算法，可以获得更好的收敛速度（即多项式）。此外，与现有的许多算法不同，我们的基于模型的算法能够保证收敛到最优值。
---
英文标题：
《Learning to Coordinate Efficiently: A Model-based Approach》
---
作者：
R. I. Brafman, M. Tennenholtz
---
最新提交年份：
2011
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要：
In common-interest stochastic games all players receive an identical payoff. Players participating in such games must learn to coordinate with each other in order to receive the highest-possible value. A number of reinforcement learning algorithms have been proposed for this problem, and some have been shown to converge to good solutions in the limit. In this paper we show that using very simple model-based algorithms, much better (i.e., polynomial) convergence rates can be attained. Moreover, our model-based algorithms are guaranteed to converge to the optimal value, unlike many of the existing algorithms.
---
PDF链接：
https://arxiv.org/pdf/1106.5258

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群