Adaptive Sampling for Best Policy Identification
in Markov Decision Processes
Aymen Al Marjani 1 Alexandre Proutiere 2
Abstract certainty. This paper, as most related work in this field, fo-
cuses on systems and control objectives that are modelled as
We investigate the problem of best-policy identifi-
a standard discounted Markov Decision Processes (MDPs)
...
附件列表