Simultaneously Learning Stochastic and Adversarial
Episodic MDPs with Known Transition
Tiancheng Jin Haipeng Luo
University of Southern California University of Southern California
tiancheng.jin@usc.edu haipengl@usc.edu
Abstract
This work studies the problem of learning episodic Markov Decision Processes
with known transition and bandit feedback. We develop the first algorithm with a
“best-o ...
附件列表