First, we look at the high-level categories of optimization algorithms and then dive deep into the individual optimizers.

**First order optimization** algorithms minimize or maximize a loss function using its gradient values concerning the parameters. The popularly used First order optimization algorithm is gradient descent. Here, the first order derivative tells us whether the function is decreasing or increasing at a particular point. The first order derivative gives us a line which is tangential to a point on its error surface.

### Note

The derivative for a function depends on single variables, whereas a gradient for a function depends on multiple variables.

**Second order optimization** algorithms use the second order derivative, which is also known as **Hessian**, to minimize or maximize the given loss function. Here, the Hessian is a matrix of second order partial derivatives. The second derivative is costly to compute. Hence, it's not used much. The second order derivative indicates...