全部版块 我的主页
论坛 经济学人 二区 外文文献专区
349 0
2022-03-06
摘要翻译:
在许多情况下,决策者希望学习一个规则或政策,从个人的可观察特征映射到一个行动。例子包括选择报价、价格、广告或发送给消费者的电子邮件,以及确定给病人开哪种药物的问题。虽然有越来越多的文献致力于这个问题,大多数现有的结果集中在数据来自随机实验的情况下,而且只有两种可能的行动,例如给病人一种药物或不给病人。本文研究了具有观测数据的离线多行动策略学习问题,其中策略可能需要尊重预算约束或属于决策树等受限策略类。我们基于高效半参数推理理论,提出并实现了一种策略学习算法,该算法可以获得渐近极小极大最优后悔。据我们所知,这是在多动作设置中这种类型的第一个结果,它比现有的学习算法提供了实质性的性能改进。然后,我们考虑在策略被限制为决策树形式的情况下,在实现我们的方法时出现的额外计算挑战。我们提出了两种不同的方法,一种使用混合整数程序公式,另一种使用基于树搜索的算法。
---
英文标题:
《Offline Multi-Action Policy Learning: Generalization and Optimization》
---
作者:
Zhengyuan Zhou, Susan Athey, Stefan Wager
---
最新提交年份:
2018
---
分类信息:

一级分类:Statistics        统计学
二级分类:Machine Learning        机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Computer Science        计算机科学
二级分类:Machine Learning        机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Economics        经济学
二级分类:Econometrics        计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--

---
英文摘要:
  In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as well as the problem of determining which medication to prescribe to a patient. While there is a growing body of literature devoted to this problem, most existing results are focused on the case where data comes from a randomized experiment, and further, there are only two possible actions, such as giving a drug to a patient or not. In this paper, we study the offline multi-action policy learning problem with observational data and where the policy may need to respect budget constraints or belong to a restricted policy class such as decision trees. We build on the theory of efficient semi-parametric inference in order to propose and implement a policy learning algorithm that achieves asymptotically minimax-optimal regret. To the best of our knowledge, this is the first result of this type in the multi-action setup, and it provides a substantial performance improvement over the existing learning algorithms. We then consider additional computational challenges that arise in implementing our method for the case where the policy is restricted to take the form of a decision tree. We propose two different approaches, one using a mixed integer program formulation and the other using a tree-search based algorithm.
---
PDF链接:
https://arxiv.org/pdf/1810.04778
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群