A model is said to overfit the training data when it generates a hypothesis that accounts for every example. What this means is that it correctly predicts the outcome of every example. The problem with this scenario is that the model equation becomes extremely complex, and such models have been observed to be incapable of correctly predicting new observations.
Overfitting occurs when a model has been over-engineered. Two of the ways in which this could occur are:
- The model is trained on too many features.
- The model is trained for too long.
We'll discuss each of these two points in the following sections.
Training on Too Many Features
When a model trains on too many features, the hypothesis becomes extremely complicated. Consider a case in which you have one column of features and you need to generate a hypothesis. This would be a simple linear equation, as shown here: