N-Step TD and TD(λ) Algorithms
In the previous chapter, we looked at Monte Carlo methods, while in the previous sections of this chapter, we learned about TD(0) ones, which, as we will discover soon, are also known as one-step temporal difference methods. In this section, we'll unify them: in fact, they are at the extreme of a spectrum of algorithms (TD(0) on one side, with MC methods at the other end), and often, the best performing methods are somewhere in the middle of this spectrum.
N-step temporal difference algorithms extend one-step TD methods. More specifically, they generalize Monte Carlo and TD approaches, making it possible to smoothly transition between the two. As we already saw, MC methods must wait until the episode finishes to back the reward up into the previous states. One-step TD methods, on the other hand, make direct use of the first available future step to bootstrap and start updating the value function of states or state-action pairs. These extremes...