Exponential Lower Bounds for Batch Reinforcement Learning:
Batch RL can be Exponentially Harder than Online RL
Andrea Zanette 1
Abstract we consider two classical batch RL problems: 1) the off-
Several practical applications of reinforcement policy evaluation (OPE) problem, where the batch algo-
learning involve an agent learning from past data rithm needs to predict the performance of a target policy
without th ...
附件列表