In the previous chapter, we learned how to train a basic neural network. We also saw the diminishing returns from further training iterations or a larger neural network in terms of its predictive ability on holdout or validation data not used to train the model. This highlights how, although a more complex model will almost always fit the data it was trained on better, it may not actually predict new data better. This chapter shows different approaches that can be used to prevent models from overfitting the data to improve generalizability, called regularization on unsupervised data. More specifically, whereas models are typically trained by optimizing parameters in a way that reduces the training error, regularization is concerned with reducing testing or validation errors so that the model performs well with new data as well as training data.
The first part of the chapter provides a conceptual overview of a variety of regularization strategies. The chapter...