摘要翻译:
上下文bandit算法对结果模型的估计方法以及所使用的探索方法非常敏感,特别是在存在丰富的异质性或复杂的结果模型时,这会导致在学习过程中难以估计的问题。我们研究了探索与开发框架的考虑,这在多武装土匪中不会出现,但在背景土匪中至关重要;目前进行探索和开发的方式影响了后续学习阶段潜在结果模型估计的偏差和方差。我们开发了参数和非参数上下文土匪,它们在估计中集成了因果推理文献中的平衡方法,以使其不容易出现估计偏差的问题。我们提供了第一个在线性上下文土匪域中平衡的上下文土匪的后悔界分析,该分析符合当前的后悔界。我们在大量监督学习数据集上,并在一个模拟初始训练数据中的模型误规格和偏见的综合例子上,证明了平衡上下文强盗强大的实用优势。此外,我们利用计量经济学文献中的稀疏模型估计方法,开发了具有更简单分配策略的上下文强盗,并实证表明,在早期阶段,它们可以提高学习率,减少后悔。
---
英文标题:
《Estimation Considerations in Contextual Bandits》
---
作者:
Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens
---
最新提交年份:
2018
---
分类信息:
一级分类:Statistics 统计学
二级分类:Machine Learning
机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Computer Science 计算机科学
二级分类:Machine Learning 机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Economics 经济学
二级分类:Econometrics 计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--
---
英文摘要:
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We study a consideration for the exploration vs. exploitation framework that does not arise in multi-armed bandits but is crucial in contextual bandits; the way exploration and exploitation is conducted in the present affects the bias and variance in the potential outcome model estimation in subsequent stages of learning. We develop parametric and non-parametric contextual bandits that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for contextual bandits with balancing in the domain of linear contextual bandits that match the state of the art regret bounds. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model mis-specification and prejudice in the initial training data. Additionally, we develop contextual bandits with simpler assignment policies by leveraging sparse model estimation methods from the econometrics literature and demonstrate empirically that in the early stages they can improve the rate of learning and decrease regret.
---
PDF链接:
https://arxiv.org/pdf/1711.07077