勘探/开发策略的元学习：多武器土匪案件

307

收藏 2022-03-15

摘要翻译：
探索/开发（E/E）困境自然地出现在许多科学子领域。多武装强盗问题以其规范形式正式化了这一困境。目前该领域的大多数研究集中在可应用于广泛问题的通用解决方案上。然而，在实践中，通常情况下，关于特定类别的目标问题，一种形式的先验信息是可用的。由于缺乏将先验知识纳入E/E战略的系统方法，目前的解决方案很少使用先验知识。针对一类特定的E/E问题，我们建议分三个步骤进行：(i)将先验知识建模为目标类E/E问题的概率分布形式；(ii)选择较大的候选E/E策略假设空间；以及(iii)求解一个优化问题，以在从先验分布中提取的问题样本上找到一个最大平均性能的候选E/E策略。我们用两个不同的假设空间来说明这种元学习方法：一个假设空间将E/E策略数值参数化，另一个假设空间将E/E策略表示为小的符号公式。对于这两种情况，我们都提出了合适的优化算法。我们在两臂Bernoulli bandit问题和不同游戏预算下的实验表明，元学习的E/E策略优于文献中的一般策略(UCB1、UCB1-Tuned、UCB-v、KL-UCB和epsilon greedy）；他们还通过对奖励服从截断高斯分布的武器进行测试，评估学习的E/E策略的鲁棒性。
---
英文标题：
《Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed
Bandit Case》
---
作者：
Francis Maes and Damien Ernst and Louis Wehenkel
---
最新提交年份：
2012
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类：Computer Science 计算机科学
二级分类：Machine Learning 机器学习
分类描述：Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文（有监督的，无监督的，强化学习，强盗问题，等等），包括健壮性，解释性，公平性和方法论。对于机器学习方法的应用，CS.LG也是一个合适的主要类别。
--
一级分类：Statistics 统计学
二级分类：Machine Learning 机器学习
分类描述：Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文（监督，无监督，半监督学习，图形模型，强化学习，强盗，高维推理等）与统计或理论基础
--

---
英文摘要：
The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the specific class of target problems. Prior knowledge is rarely used in current solutions due to the lack of a systematic approach to incorporate it into the E/E strategy. To address a specific class of E/E problems, we propose to proceed in three steps: (i) model prior knowledge in the form of a probability distribution over the target class of E/E problems; (ii) choose a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to find a candidate E/E strategy of maximal average performance over a sample of problems drawn from the prior distribution. We illustrate this meta-learning approach with two different hypothesis spaces: one where E/E strategies are numerically parameterized and another where E/E strategies are represented as small symbolic formulas. We propose appropriate optimization algorithms for both cases. Our experiments, with two-armed Bernoulli bandit problems and various playing budgets, show that the meta-learnt E/E strategies outperform generic strategies of the literature (UCB1, UCB1-Tuned, UCB-v, KL-UCB and epsilon greedy); they also evaluate the robustness of the learnt E/E strategies, by tests carried out on arms whose rewards follow a truncated Gaussian distribution.
---
PDF链接：
https://arxiv.org/pdf/1207.5208

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群