The next improvement that we are going to look at addresses another RL problem: exploration of the environment. The paper that we will draw from is called Noisy Networks for Exploration ( Fortunato and others, 2017) and it has a very simple idea for learning exploration characteristics during training instead of having a separate schedule related to exploration.
Classical DQN achieves exploration by choosing random actions with a specially defined hyperparameter epsilon, which is slowly decreased over time from 1.0 (fully random actions) to some small ratio of 0.1 or 0.02. This process works well for simple environments with short episodes, without much non-stationarity during the game; but even in such simple cases, it requires tuning to make the training processes efficient.
In the Noisy Networks paper, the authors proposed a quite simple solution that, nevertheless, works well. They add noise to the weights of fully connected layers of the network and adjust...