Revisiting Peng’s Q(λ) for Modern Reinforcement Learning
Tadashi Kozuno 1 * Yunhao Tang 2 * Mark Rowland 3 Remi Munos 4 Steven Kapturowski 3 Will Dabney 3
Michal Valko 4 David Abel 3
Abstract 1996; Watkins, 1989; Peng & Williams, 1994; 1996; Precup
Off-policy multi-step reinforcement learning et al., 2000; Harutyunyan et al., 2016; Munos et al., 2016;
algorithms consist of conservative and non- Rowland et al., 2020), pot ...
附件列表