## Regularization

The ordinary least squares method for finding the regression parameters is a specific case of the maximum likelihood. Therefore, regression models are subject to the same challenge in terms of overfitting as any other discriminative models. You are already aware of the fact that regularization is used to reduce model complexity and avoid overfitting, as stated in the *Overfitting* section in Chapter 2, *Hello World!*

### L_{n} roughness penalty

**Regularization** consists of adding a *J(w) *penalty function to the loss function (or RSS in the case of a regressive classifier) in order to prevent the model parameters (also known as weights) from reaching high values. A model that fits a training set very well tends to have many features variables with relatively large weights. This process is known as **shrinkage**. Practically, shrinkage involves adding a function with model parameters as an argument to the loss function (**M5**):

The penalty function is completely independent of the training set *{x...*