DAgger is one of the most-used imitation learning algorithms. Let's understand how DAgger works with an example. Let's revisit our example of training an agent to drive a car. First, we initialize an empty dataset .
In the first iteration, we start off with some policy to drive the car. Thus, we generate a trajectory using the policy . We know that the trajectory consists of a sequence of states and actions—that is, states visited by our policy and actions made in those states using our policy . Now, we create a new dataset by taking only the states visited by our policy and we use an expert to provide the actions for those states. That is, we take all the states from the trajectory and ask the expert to provide actions for those states.
Now, we combine the new dataset with our initialized empty dataset and update as:
Next, we train a classifier on this updated dataset and learn a new policy .
In the second iteration, we use the...