重复对抗博弈中的自动规划

222

收藏 2022-04-01

摘要翻译：
博弈论的规定力通常依赖于充分的理性和/或自我博弈的互动。相反，这项工作抛开了这些基本前提，而是专注于两个或多个代理之间的异构自治交互。具体地说，我们引入了一种新的简洁的重复对抗（常和）博弈的表示，它突出了必要的特征，使自动计划代理在面对异质对手时能够推理如何得分高于博弈的纳什均衡。为此，我们提出了一个基于模型的RL算法TeamUP，该算法用于学习和规划这样一个抽象。本质上，它有点类似于R-max，有一个巧妙设计的奖励形状，将探索视为一个对抗性优化问题。在实践中，它试图找到一个盟友，与之默契地共谋（在两人以上的游戏中），然后合作制定一个联合行动计划，在对抗性重复游戏中能够持续获得高效用。我们使用首届柠檬水站游戏锦标赛来证明我们方法的有效性，并发现团队合作是表现最好的代理，将锦标赛的实际获胜策略降至第二位。在我们的实验分析中，我们展示了我们的策略成功地和一致地与许多不同的异构（有时非常复杂）的对手建立了协作。
---
英文标题：
《Automated Planning in Repeated Adversarial Games》
---
作者：
Enrique Munoz de Cote, Archie C. Chapman, Adam M. Sykulski, Nicholas
R. Jennings
---
最新提交年份：
2012
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Computer Science and Game Theory 计算机科学与博弈论
分类描述：Covers all theoretical and applied aspects at the intersection of computer science and game theory, including work in mechanism design, learning in games (which may overlap with Learning), foundations of agent modeling in games (which may overlap with Multiagent systems), coordination, specification and formal methods for non-cooperative computational environments. The area also deals with applications of game theory to areas such as electronic commerce.
涵盖计算机科学和博弈论交叉的所有理论和应用方面，包括机制设计的工作，游戏中的学习（可能与学习重叠），游戏中的agent建模的基础（可能与多agent系统重叠），非合作计算环境的协调、规范和形式化方法。该领域还涉及博弈论在电子商务等领域的应用。
--
一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要：
Game theory's prescriptive power typically relies on full rationality and/or self-play interactions. In contrast, this work sets aside these fundamental premises and focuses instead on heterogeneous autonomous interactions between two or more agents. Specifically, we introduce a new and concise representation for repeated adversarial (constant-sum) games that highlight the necessary features that enable an automated planing agent to reason about how to score above the game's Nash equilibrium, when facing heterogeneous adversaries. To this end, we present TeamUP, a model-based RL algorithm designed for learning and planning such an abstraction. In essence, it is somewhat similar to R-max with a cleverly engineered reward shaping that treats exploration as an adversarial optimization problem. In practice, it attempts to find an ally with which to tacitly collude (in more than two-player games) and then collaborates on a joint plan of actions that can consistently score a high utility in adversarial repeated games. We use the inaugural Lemonade Stand Game Tournament to demonstrate the effectiveness of our approach, and find that TeamUP is the best performing agent, demoting the Tournament's actual winning strategy into second place. In our experimental analysis, we show hat our strategy successfully and consistently builds collaborations with many different heterogeneous (and sometimes very sophisticated) adversaries.
---
PDF链接：
https://arxiv.org/pdf/1203.3498

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群