In this chapter, we will wrap up the part one of the book and get familiar with one of the RL methods—cross-entropy. Despite the fact that it is much less famous than other tools in the RL practitioner's toolbox, such as deep Q-network (DQN) or Advantage Actor-Critic, this method has its own strengths. The most important are as follows:
Simplicity: The cross-entropy method is really simple, which makes it an intuitive method to follow. For example, its implementation on PyTorch is less than 100 lines of code.
Good convergence: In simple environments that don't require complex, multistep policies to be learned and discovered and have short episodes with frequent rewards, cross-entropy usually works very well. Of course, lots of practical problems don't fall into this category, but sometimes they do. In such cases, cross-entropy (on its own or as a part of a larger system) can be the perfect fit.
In the following sections, we will start from the practical...