Adversarial Dueling Bandits
Aadirupa Saha 1 Tomer Koren 2 Yishay Mansour 2
Abstract regret with respect to the best item in hindsight, according
We introduce the problem of regret minimization to a certain score function.
in Adversarial Dueling Bandits. As in classic Numerous real-world applications are naturally modelled as
Dueling Bandits, the learner has to repeatedly dueling bandit problems, including movie recommendati ...
附件列表