Reinforcement Learning — TD(λ) Introduction(3)
Extend TD(λ) on Q function with Sarsa(λ)
Published in
6 min readSep 14, 2019
In last posts, we have learnt the idea of TD(λ) with eligibility trace, which is a combination of n-step TD method, and have applied it on random walk example. In this post, let’s extend the idea of lambda to more general use cases — instead of learning a state-value function, a Q function of state, action value will be learnt. In this article, we will:
- Learn the idea of Sarsa(λ)
- Apply it on mountain car example