When we have more than two classes, we have to modify our approach slightly. In the output layer of the neural network, we now have the same number of nodes as the number of classes. The values in these nodes are normalized using the softmax function, such that they all add up to 1. We can interpret these normalized values as probabilities, and the node with the highest probability is our predicted class. The softmax function is given by , where is the vector of output nodes.
When evaluating the model, we have to increase the size of our confusion matrix. Figure 4.16 shows a confusion matrix with three classes. The "Yay!" boxes contain the counts of correct predictions, while the "Nope!" boxes contain the counts of incorrect predictions:
Figure 4.16: The confusion matrix with three classes
With this, we can calculate both overall metrics and one-vs-all metrics. In one-vs-all evaluations, we have one class (such as class...