全部版块 我的主页
论坛 经济学人 二区 外文文献专区
344 0
2022-03-20
摘要翻译:
随机多臂强盗模型抓住了勘探与开采之间的权衡。我们研究了竞争和合作对这种权衡的影响。假设有$k$arms和两个玩家,爱丽丝和鲍勃。在每一轮中,每个玩家都拉一只胳膊,接受由此产生的奖励,并观察对方的选择,但不观察他们的奖励。Alice的实用工具是$\gamma_a+\lambda\gamma_b$(对于Bob也是如此),其中$\gamma_a$是Alice的总奖励,$\lambda\in[-1,1]$是合作参数。在$\lambda=-1$,玩家在零和游戏中竞争,在$\lambda=1$,他们完全合作,在$\lambda=0$,他们是中立的:每个玩家的效用是他们自己的回报。该模型与经济学中关于战略实验的文献有关,在这些文献中,参与者通常会观察对方的奖励。通过贴现因子$\beta$,Gittins指数将单人博弈问题简化为一个具有先验$\mu$的风险arm和一个具有成功概率$P$的可预测arm之间的比较。当玩家在两臂之间无动于衷时,$P$的值是Gittins指数$g=g(\mu,\beta)>M$,其中$M$是风险臂的平均值。我们表明,竞争的玩家比单个玩家探索的更少:(m,g)$中有$P^*\,所以对于所有$P>P^*$,玩家都停留在可预测的臂上。然而,玩家并不近视:他们仍然探索大约$P>M$。另一方面,合作玩家比单个玩家探索更多。我们还表明,中立玩家相互学习,获得严格高于他们单独游戏的总奖励,对于所有$P\in(p^*,g)$,其中$P^*$是竞争情况的阈值。最后,我们证明了在每一个纳什均衡中,竞争和中立的玩家最终会在同一条手臂上定居,而对于合作的玩家,这可能会失败。
---
英文标题:
《Multiplayer Bandit Learning, from Competition to Cooperation》
---
作者:
Simina Br\^anzei and Yuval Peres
---
最新提交年份:
2019
---
分类信息:

一级分类:Computer Science        计算机科学
二级分类:Computer Science and Game Theory        计算机科学与博弈论
分类描述:Covers all theoretical and applied aspects at the intersection of computer science and game theory, including work in mechanism design, learning in games (which may overlap with Learning), foundations of agent modeling in games (which may overlap with Multiagent systems), coordination, specification and formal methods for non-cooperative computational environments. The area also deals with applications of game theory to areas such as electronic commerce.
涵盖计算机科学和博弈论交叉的所有理论和应用方面,包括机制设计的工作,游戏中的学习(可能与学习重叠),游戏中的agent建模的基础(可能与多agent系统重叠),非合作计算环境的协调、规范和形式化方法。该领域还涉及博弈论在电子商务等领域的应用。
--
一级分类:Computer Science        计算机科学
二级分类:Machine Learning        机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Economics        经济学
二级分类:Theoretical Economics        理论经济学
分类描述:Includes theoretical contributions to Contract Theory, Decision Theory, Game Theory, General Equilibrium, Growth, Learning and Evolution, Macroeconomics, Market and Mechanism Design, and Social Choice.
包括对契约理论、决策理论、博弈论、一般均衡、增长、学习与进化、宏观经济学、市场与机制设计、社会选择的理论贡献。
--

---
英文摘要:
  The stochastic multi-armed bandit model captures the tradeoff between exploration and exploitation. We study the effects of competition and cooperation on this tradeoff. Suppose there are $k$ arms and two players, Alice and Bob. In every round, each player pulls an arm, receives the resulting reward, and observes the choice of the other player but not their reward. Alice's utility is $\Gamma_A + \lambda \Gamma_B$ (and similarly for Bob), where $\Gamma_A$ is Alice's total reward and $\lambda \in [-1, 1]$ is a cooperation parameter. At $\lambda = -1$ the players are competing in a zero-sum game, at $\lambda = 1$, they are fully cooperating, and at $\lambda = 0$, they are neutral: each player's utility is their own reward. The model is related to the economics literature on strategic experimentation, where usually players observe each other's rewards.   With discount factor $\beta$, the Gittins index reduces the one-player problem to the comparison between a risky arm, with a prior $\mu$, and a predictable arm, with success probability $p$. The value of $p$ where the player is indifferent between the arms is the Gittins index $g = g(\mu,\beta) > m$, where $m$ is the mean of the risky arm.   We show that competing players explore less than a single player: there is $p^* \in (m, g)$ so that for all $p > p^*$, the players stay at the predictable arm. However, the players are not myopic: they still explore for some $p > m$. On the other hand, cooperating players explore more than a single player. We also show that neutral players learn from each other, receiving strictly higher total rewards than they would playing alone, for all $ p\in (p^*, g)$, where $p^*$ is the threshold from the competing case.   Finally, we show that competing and neutral players eventually settle on the same arm in every Nash equilibrium, while this can fail for cooperating players.
---
PDF链接:
https://arxiv.org/pdf/1908.01135
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群