A more advanced image classification competition, what we might call the spiritual successor of the ImageNet challenge, is known as COCO: Common Objects in Context (http://cocodataset.org/). The goal with COCO is to find multiple objects within an image and identify their location and category. For example, a single photo may have two people and two horses. The COCO dataset has 1.5 million labeled objects spanning 80 different categories and 330,000 images.
Several deep neural network architectures have been developed to solve the COCO challenge, achieving varying levels of accuracy. Measuring accuracy on this task is a little more involved considering one has to account for multiple objects in the same image and also give credit for identifying the correct location in the image for each object. The details of these measurements are beyond the scope of this chapter, though Jonathan Hui provides a good explanation (https://medium.com/@jonathan_hui/map-mean-average-precision...