论竞争多Agent强化中的信息不对称学习：收敛性与最优性

nandehutu2022

419

收藏 2022-03-13

摘要翻译：
在本文中，我们研究了两个Q-学习agent相互作用的非合作系统，其中一个agent有权观察另一个agent的行为。我们表明，这种信息不对称可以导致群体学习的稳定结果，而这在一般独立学习者的环境中通常不会发生。由此产生的后学习策略在潜在的博弈意义上几乎是最优的，即它们形成了一个纳什均衡。在此基础上，我们提出了一个Q-学习算法，该算法要求对随后的两个对手的行动进行预测观察，并在后者采用平稳策略的情况下得到一个最优策略，并讨论了潜在信息不对称博弈中纳什均衡的存在性。
---
英文标题：
《On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality》
---
作者：
Ezra Tampubolon, Haris Ceribasic, Holger Boche
---
最新提交年份：
2021
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Machine Learning 机器学习
分类描述：Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文（有监督的，无监督的，强化学习，强盗问题，等等），包括健壮性，解释性，公平性和方法论。对于机器学习方法的应用，CS.LG也是一个合适的主要类别。
--
一级分类：Computer Science 计算机科学
二级分类：Computer Science and Game Theory 计算机科学与博弈论
分类描述：Covers all theoretical and applied aspects at the intersection of computer science and game theory, including work in mechanism design, learning in games (which may overlap with Learning), foundations of agent modeling in games (which may overlap with Multiagent systems), coordination, specification and formal methods for non-cooperative computational environments. The area also deals with applications of game theory to areas such as electronic commerce.
涵盖计算机科学和博弈论交叉的所有理论和应用方面，包括机制设计的工作，游戏中的学习（可能与学习重叠），游戏中的agent建模的基础（可能与多agent系统重叠），非合作计算环境的协调、规范和形式化方法。该领域还涉及博弈论在电子商务等领域的应用。
--
一级分类：Computer Science 计算机科学
二级分类：Multiagent Systems 多智能体系统
分类描述：Covers multiagent systems, distributed artificial intelligence, intelligent agents, coordinated interactions. and practical applications. Roughly covers ACM Subject Class I.2.11.
涵盖多Agent系统、分布式人工智能、智能Agent、协调交互。和实际应用。大致涵盖ACM科目I.2.11类。
--
一级分类：Computer Science 计算机科学
二级分类：Systems and Control 系统与控制
分类描述：cs.SY is an alias for eess.SY. This section includes theoretical and experimental research covering all facets of automatic control systems. The section is focused on methods of control system analysis and design using tools of modeling, simulation and optimization. Specific areas of research include nonlinear, distributed, adaptive, stochastic and robust control in addition to hybrid and discrete event systems. Application areas include automotive and aerospace control systems, network control, biological systems, multiagent and cooperative control, robotics, reinforcement learning, sensor networks, control of cyber-physical and energy-related systems, and control of computing systems.
cs.sy是eess.sy的别名。本部分包括理论和实验研究，涵盖了自动控制系统的各个方面。本节主要介绍利用建模、仿真和优化工具进行控制系统分析和设计的方法。具体研究领域包括非线性、分布式、自适应、随机和鲁棒控制，以及混合和离散事件系统。应用领域包括汽车和航空航天控制系统、网络控制、生物系统、多智能体和协作控制、机器人学、强化学习、传感器网络、信息物理和能源相关系统的控制以及计算系统的控制。
--
一级分类：Economics 经济学
二级分类：Theoretical Economics 理论经济学
分类描述：Includes theoretical contributions to Contract Theory, Decision Theory, Game Theory, General Equilibrium, Growth, Learning and Evolution, Macroeconomics, Market and Mechanism Design, and Social Choice.
包括对契约理论、决策理论、博弈论、一般均衡、增长、学习与进化、宏观经济学、市场与机制设计、社会选择的理论贡献。
--
一级分类：Electrical Engineering and Systems Science 电气工程与系统科学
二级分类：Systems and Control 系统与控制
分类描述：This section includes theoretical and experimental research covering all facets of automatic control systems. The section is focused on methods of control system analysis and design using tools of modeling, simulation and optimization. Specific areas of research include nonlinear, distributed, adaptive, stochastic and robust control in addition to hybrid and discrete event systems. Application areas include automotive and aerospace control systems, network control, biological systems, multiagent and cooperative control, robotics, reinforcement learning, sensor networks, control of cyber-physical and energy-related systems, and control of computing systems.
本部分包括理论和实验研究，涵盖了自动控制系统的各个方面。本节主要介绍利用建模、仿真和优化工具进行控制系统分析和设计的方法。具体研究领域包括非线性、分布式、自适应、随机和鲁棒控制，以及混合和离散事件系统。应用领域包括汽车和航空航天控制系统、网络控制、生物系统、多智能体和协作控制、机器人学、强化学习、传感器网络、信息物理和能源相关系统的控制以及计算系统的控制。
--

---
英文摘要：
In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.
---
PDF链接：
https://arxiv.org/pdf/2010.10901

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群