Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
Sajad Khodadadian 1 Zaiwei Chen 2 Siva Theja Maguluri 1
Abstract An AC algorithm can be thought as a generalized policy iter-
In this paper, we provide finite-sample conver- ation (Puterman, 1995), and consists of two phases, namely
gence guarantees for an off-policy variant of the actor and critic. The objective of the actor is to improve the
natural actor-crit ...
附件列表