7. Temporal Difference Learning
Overview
In this chapter, we will be introduced to Temporal Difference (TD) learning and focus on how it develops the ideas of the Monte Carlo methods and dynamic programming. TD learning is one of the key topics in the field and studying it allows us to have a deep understanding of reinforcement learning and how it works at the most fundamental level. A new perspective will allow us to see MC methods as a particular case of TD ones, unifying the approach and extending their applicability to non-episodic problems. By the end of this chapter, you will be able to implement the TD(0), SARSA, Q-learning, and TD(λ) algorithms and use them to solve environments with both stochastic and deterministic transition dynamics.