Efficient Performance Bounds for Primal-Dual Reinforcement
Learning from Demonstrations
Angeliki Kamoutsi 1 Goran Banjac 1 John Lygeros 1
Abstract In the standard RL setting a cost signal is given to instruct
We consider large-scale Markov decision pro- agents how to complete the desired task. However, often-
cesses with an unknown cost function and ad- times encoding preferences using demonstrations provided
...
附件列表