摘要翻译:
在线最小二乘估计分析是许多随机序列决策问题的核心。利用自归一化过程中的工具,给出了向量值鞅的尾界的一个简单的、自包含的证明。我们利用该界构造了一个新的更紧的置信集,用于最小二乘估计。我们将置信度集应用于几个在线决策问题,如多臂问题和线性参数化的bandit问题。该置信度集可能适用于其他问题,如睡眠强盗、广义线性强盗和其他线性控制问题。对Auer等人的上置信限(UCB)算法的遗憾界进行了改进。(2002)并表明其遗憾是与高概率问题相关的常数。在线性土匪(Dani et al.,2008)的情况下,我们在时间步长维数和时间步长个数上改进了问题相关界。此外,与前人的结果相反,我们证明了我们的界在小样本规模下成立,同时用对数因子改进了最坏情况的界,并改进了常数。
---
英文标题:
《Online Least Squares Estimation with Self-Normalized Processes: An
Application to Bandit Problems》
---
作者:
Yasin Abbasi-Yadkori, David Pal, Csaba Szepesvari
---
最新提交年份:
2011
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
The analysis of online least squares estimation is at the heart of many stochastic sequential decision making problems. We employ tools from the self-normalized processes to provide a simple and self-contained proof of a tail bound of a vector-valued martingale. We use the bound to construct a new tighter confidence sets for the least squares estimate. We apply the confidence sets to several online decision problems, such as the multi-armed and the linearly parametrized bandit problems. The confidence sets are potentially applicable to other problems such as sleeping bandits, generalized linear bandits, and other linear control problems. We improve the regret bound of the Upper Confidence Bound (UCB) algorithm of Auer et al. (2002) and show that its regret is with high-probability a problem dependent constant. In the case of linear bandits (Dani et al., 2008), we improve the problem dependent bound in the dimension and number of time steps. Furthermore, as opposed to the previous result, we prove that our bound holds for small sample sizes, and at the same time the worst case bound is improved by a logarithmic factor and the constant is improved.
---
PDF链接:
https://arxiv.org/pdf/1102.2670