The feed-forward model
During the training, we have several charts in TensorBoard showing us what's going on.
Figure 10.3: The reward for episodes during the training
Figure 10.4: The reward for test episodes
The two preceding charts show the reward for episodes played during the training and the reward obtained from testing (which is done on the same quotes, but with
epsilon=0). From them, we see that our agent is learning how to increase the profit from its actions over time.
Figure 10.5: The lengths of played episodes
Figure 10.6: The values predicted by the network on a subset of states