On Proximal Policy Optimization’s Heavy-tailed Gradients
Saurabh Garg 1 Joshua Zhanson 2 Emilio Parisotto 1 Adarsh Prasad 1 J. Zico Kolter 2 Zachary C. Lipton 1
Sivaraman Balakrishnan 3 Ruslan Salakhutdinov 1 Pradeep Ravikumar 1
Abstract Mnih et al., 2015), policy gradient methods (Williams, 1992;
Modern policy gradient algorithms such as Proxi- Sutton et al., 2000; Mnih et al., 2016) have risen as a popu-
mal Policy Optimization (PPO) rely on an ...
附件列表