The basic concept of the L1 penalty, also known as the Least Absolute Shrinkage and Selection Operator (lasso)—(Hastie, T., Tibshirani, R., and Friedman, J. (2009)), is that a penalty is used to shrink weights towards zero. The penalty term uses the sum of the absolute weights, so the degree of penalty is no smaller or larger for small or large weights, with the result that small weights may get shrunken to zero, a convenient effect as, in addition to preventing overfitting, it can be a sort of variable selection. The strength of the penalty is controlled by a hyperparameter, λ, which multiplies the sum of the absolute weights, and can be set a priori or, as with other hyperparameters, optimized using cross validation or some similar approach.
Mathematically, it is easier to start with an Ordinary Least Squares (OLS) regression model. In regression, a set of coefficients or model weights are estimated using the least squared error criteria, where the weight/coefficient vector,...