Regret Minimization for Reinforcement Learning by
Evaluating the Optimal Bias Function
Zihan Zhang Xiangyang Ji
Tsinghua University Tsinghua University
zihan-zh17@mails.tsinghua.edu.cn xyji@tsinghua.edu.cn
Abstract
We present an algorithm based on the Optimism in the Face of Uncertainty (OFU)
principle which is able to learn Reinforcement Learning (RL) modeled by Markov
deci ...
附件列表