So far, we have limited ourselves to recognizing the single most dominant object within an image using a convolutional neural network (CNN). We have seen how a model can be trained to take in a image and extract a series of feature maps that are then fed into a fully connected layer to output a probability distribution of a set of classes. This is then interpreted to classify the object within the image, as shown here:
In this chapter, we will build on this and explore how we can detect and locate multiple objects within a single image. We will start by building up our understanding of how this works and then walk through implementing a image search for a photo gallery application. This application allows the user to filter and sort images not only based on what objects are present in the image, but also on their position relative to one another (object composition). Along the way, we will also get hands-on experience with Core ML Tools, the tool set...