Neural Temporal-Difference Learning
Converges to Global Optima
Qi Cai Zhuoran Yang Jason D. Lee Zhaoran Wang
Abstract
Temporal-difference learning (TD), coupled with neural networks, is among the
most fundamental building blocks of deep reinforcement learning. However, due
to the nonlinearity in value function approximation, such a coupling leads to non-
convexity and even divergence in optimization. As a result, th ...
附件列表