Online EXP3 Learning in Adversarial Bandits with
Delayed Feedback
Ilai Bistritz1 , Zhengyuan Zhou23 , Xi Chen2 , Nicholas Bambos1 , Jose Blanchet1
1
Stanford University
2
New York University, Stern School of Business
3
IBM Research
{bistritz,bambos,jose.blanchet}@stanford.edu, {zzhou,xchen3}@stern.nyu.edu
Abstract
Consider a player that in ...
附件列表