Actor-Critic Methods – A2C and A3C
So far, we have covered two types of methods for learning the optimal policy. One is the value-based method, and the other is the policy-based method. In the value-based method, we use the Q function to extract the optimal policy. In the policy-based method, we compute the optimal policy without using the Q function.
In this chapter, we will learn about another interesting method called the actor-critic method for finding the optimal policy. The actor-critic method makes use of both the value-based and policy-based methods. We will begin the chapter by understanding what the actor-critic method is and how it makes use of value-based and policy-based methods. We will acquire a basic understanding of actor-critic methods, and then we will learn about them in detail.
Moving on, we will also learn how actor-critic differs from the policy gradient with baseline method, and we will learn the algorithm of the actor-critic method in...