Phasic Policy Gradient
Karl Cobbe 1 Jacob Hilton 1 Oleg Klimov 1 John Schulman 1
Abstract can be used to better optimize the other.
We introduce Phasic Policy Gradient (PPG), a re- However, there are also disadvantages to sharing network
inforcement learning framework which modifies parameters. First, it is not clear how to appropriately balance
traditional on-policy actor-critic methods by sepa- the competing objectives of t ...
附件列表