For the MNIST dataset classification, we will use the softmax classifier. The softmax function normalizes the inputs and generates the probability distribution of probabilities. The softmax operation can be denoted as follows:
cuDNN's softmax forward function supports this operation, along with the channels and all the instances. Previously, we aligned the dense layer's output with the channels. Therefore, we will apply the softmax operation along with the channels.
To confirm that our training is done effectively, we need to calculate the loss function. The softmax loss function is called cross-entropy loss since its loss function is used to obtain loss across probabilities. The loss function is as follows:
We need to obtain the gradient of this softmax loss to update the neural networks. Fortunately...