Classification is a very wide topic in machine learning. It consists of predicting a class or a category, as we have shown with our handwritten digits example. In Chapter 7, Classifying Images with Residual Networks, we'll see how to classify a wider set of natural images and objects.
Classification can be applied to different problems and the cross-entropy/negative log likelihood is the common loss function to solve them through gradient descent. There are many other loss functions for problems such as regression (mean square error loss) or unsupervised joint learning (hinge loss).
In this chapter, we have been using a very simple update rule as gradient descent named stochastic gradient descent, and presented some other gradient descent variants (Momentum
, Nesterov
, RMSprop
, ADAM
, ADAGRAD
, ADADELTA
). There has been some research into second order optimizations, such as Hessian Free, or K-FAC, which provided better results in deep or recurrent networks but remain complex and costly...