Asynchronous advantage actor-critic (A3C)
Asynchronous advantage actor-critic, hereinafter referred to as A3C, is one of the popular actor-critic algorithms. The main idea behind the asynchronous advantage actor-critic method is that it uses several agents for learning in parallel and aggregates their overall experience.
In A3C, we will have two types of networks, one is a global network (global agent), and the other is the worker network (worker agent). We will have many worker agents, each worker agent uses a different exploration policy, and they learn in their own copy of the environment and collect experience. Then, the experience obtained from these worker agents is aggregated and sent to the global agent. The global agent aggregates the learning.
Now that we have a very basic idea of how A3C works, let's go into more detail.