Maximum likelihood estimation
On several occasions throughout this book, we've expressed optimization problems in terms of a cost function to be minimized. For example, in Chapter 4, Classification, we used Incanter to minimize the logistic cost function whilst building a logistic regression classifier, and in Chapter 5, Big Data, we used gradient descent to minimize a least-squares cost function when performing batch and stochastic gradient descent.
Optimization can also be expressed as a benefit to maximize, and it's sometimes more natural to think in these terms. Maximum likelihood estimation aims to find the best parameters for a model by maximizing the likelihood function.
Let's say that the probability of an observation x given model parameters β is written as:
Then, the likelihood can be expressed as:
The likelihood is a measure of the probability of the parameters, given the data. The aim of maximum likelihood estimation is to find the parameter values that make the observed data most...