EMaQ: Expected-Max Q-Learning Operator
for Simple Yet Effective Ofine and Online RL
Seyed Kamyar Seyed Ghasemipour * 1 2 Dale Schuurmans 3 Shixiang Shane Gu 3
Abstract 1. Introduction
Off-policy reinforcement learning (RL) holds the Leveraging past interactions in order to improve a decision-
promise of sample-efcient learning of decision- making process is the hallmark goal of off-policy reinforce-
making policies by ...
附件列表