So far, we've seen that the accuracy of the training dataset is typically more than 95%, while the accuracy of the validation dataset is ~89%.
Essentially, this indicates that the model does not generalize as much on unseen datasets since it can learn from the training dataset. This also indicates that the model is learning all the possible edge cases for the training dataset; these can't be applied to the validation dataset.
We will look at what impact they have in the following sections.
Impact of adding dropout
We have already learned that whenever loss.backward() is calculated, a weight update happens. Typically, we would have hundreds of thousands of parameters within a network and...