## Cross-entropy Loss

**Cross-entropy loss** is used when we are working with a classification problem where the output of each class is a probability value between 0 and 1. The loss here increases as the model deviates from the actual value; it follows a negative log graph. This helps when the model predicts probabilities that are far from the actual value. For example, if the probability of the true label is 0.05, we penalize the model with a huge loss. On the other hand, if the probability of the true label is 0.40, we penalize it with a smaller loss.

###### Figure 6.9: Graph of log loss versus probability

The preceding graph shows that the loss increases exponentially as the predictions get further from the true label. The formula that the cross-entropy loss follows is as follows:

###### Figure 6.10: Cross entropy loss formula

*M* is number of classes in the dataset (10 in the case of MNIST), *y* is the true label, and *p* is the predicted probability of the class. We prefer cross-entropy...