## Numerical optimization

This section briefly introduces the different optimization algorithms that can be applied to minimize the loss function, with or without a penalty term. These algorithms are described in more detail in the *Summary of optimization techniques* section in the Appendix A, *Basic Concepts*.

First, let's define the **least squares problem**. The minimization of the loss function consists of nullifying the first order derivatives, which in turn generates a system of *D* equations (also known as the gradient equations), *D* being the number of regression weights (parameters). The weights are iteratively computed by solving the system of equations using a numerical optimization algorithm.

### Note

M10: The definition of the least squares-based loss function for residual *r _{i}*, weights

*w*, a model

*f*, input data

*x*, and expected values

_{i}*y*is as follows:

_{i}M10: The generation of gradient equations with a Jacobian *J* matrix (refer to the *Mathematics* section in the Appendix A, *Basic Concepts*) after minimization...