## Maximum likelihood estimation

On several occasions throughout this book, we've expressed optimization problems in terms of a cost function to be minimized. For example, in Chapter 4, *Classification*, we used Incanter to minimize the logistic cost function whilst building a logistic regression classifier, and in Chapter 5, *Big Data*, we used gradient descent to minimize a least-squares cost function when performing batch and stochastic gradient descent.

Optimization can also be expressed as a benefit to maximize, and it's sometimes more natural to think in these terms. Maximum likelihood estimation aims to find the best parameters for a model by maximizing the likelihood function.

Let's say that the probability of an observation *x* given model parameters *Î²* is written as:

Then, the likelihood can be expressed as:

The likelihood is a measure of the *probability of the parameters*, given the data. The aim of maximum likelihood estimation is to find the parameter values that make the observed data most...