Worst-Case Regret Bounds for Exploration via
Randomized Value Functions
Daniel Russo
Columbia University
djr2174@gsb.columbia.edu
Abstract
This paper studies a recent proposal to use randomized value functions to drive
exploration in reinforcement learning. These randomized value functions are
generated by injecting random noise into the training data, making the approach
compatible wit ...
附件列表