Ensemble Bootstrapping for Q-Learning
Oren Peer 1 Chen Tessler 1 Nadav Merlis 1 Ron Meir 1
Abstract focuses on learning the value-function. The value represents
Q-learning (QL), a common reinforcement learn- the expected, discounted, reward-to-go that the agent will
ing algorithm, suffers from over-estimation bias obtain. In particular, such methods learn the optimal pol-
due to the maximization term in the optimal ...
附件列表