Chapter 11 – Actor-Critic Methods – A2C and A3C
- The actor-critic method is one of the most popular algorithms in deep RL. Several modern deep RL algorithms are designed based on the actor-critic method. The actor-critic method lies at the intersection of value-based and policy-based methods. That is, it takes advantage of both value-based and policy-based methods.
- In the actor-critic method, the actor computes the optimal policy and the critic evaluates the policy computed by the actor network by estimating the value function.
- In the policy gradient method with baseline, first, we generate complete episodes (trajectories), and then we update the parameter of the network, whereas, in the actor-critic method, we update the network parameter at every step of the episode.
- In the actor network, we compute the gradient as .
- In advantage actor-critic (A2C), we compute the policy gradient with the advantage function and the advantage function...