Thompson Sampling with Information Relaxation
Penalties
Seungki Min Costis Maglaras
Columbia Business School Columbia Business School
Ciamac C. Moallemi
Columbia Business School
Abstract
We consider a finite-horizon multi-armed bandit (MAB) problem in a Bayesian
setting, for which we propose an information relaxation sampling framework.
With this framework, we defin ...
附件列表