Preferential Temporal Difference Learning
Nishanth Anand 1 2 Doina Precup 1 2 3
Abstract TD-learning can be viewed as a way to approximate dy-
Temporal-Difference (TD) learning is a general namic programming algorithms in Markovian environ-
and very useful tool for estimating the value func- ments (Barnard, 1993). But, if the Markovian assumption
tion of a given policy, which in turn is required does not hold (as is ...
附件列表