Value Function in Frequency Domain
and the Characteristic Value Iteration Algorithm
Amir-massoud Farahmand
Vector Institute
University of Toronto
Toronto, Canada
farahmand@vectorinstitute.ai
Abstract
This paper considers the problem of estimating the distribution of returns in rein-
forcement learning (i.e., distributional RL problem). It presents a new re ...
附件列表