## Classification metrics

If the label is discrete, the prediction problem is called classification. In general, the target can take only one of the values for each record (even though multivalued targets are possible, particularly for text classification problems to be considered in Chapter 6, *Working with Unstructured Data*).

If the discrete values are ordered and the ordering makes sense, such as *Bad*, *Worse*, *Good*, the discrete labels can be cast into integer or double, and the problem is reduced to regression (we believe if you are between *Bad* and *Worse*, you are definitely farther away from being *Good* than *Worse*).

A generic metric to optimize is the misclassification rate is as follows:

However, if the algorithm can predict the distribution of possible values for the target, a more general metric such as the KL divergence or Manhattan can be used.

KL divergence is a measure of information loss when probability distribution is used to approximate probability distribution :

It is closely related...