# Numerical optimization

This section briefly introduces the different optimization algorithms that can be applied to minimize the loss function, with or without a penalty term. These algorithms are described in greater detail in the *Summary of optimization techniques* section in Appendix A, *Basic Concepts*.

First, let's define the **least squares problem**. The minimization of the loss function consists of nullifying the first order derivatives, which in turn generates a system of D equations (also known as gradient equations), D being the number of regression weights (parameters). The weights are iteratively computed by solving the system of equations using a numerical optimization algorithm.

### Note

The definition of the least squares-based loss function is as follows:

The generation of gradient equations with a Jacobian `J`

matrix (refer to the *Jacobian and Hessian matrices* section in Appendix A, *Basic Concepts*) after minimization of the loss function `L`

is described as follows:

Iterative approximation...