In this section we will move on to a slightly different kind of object detector called a single shot detectors. Single shot detectors try posing object detection as a regression problem. One of the main architectures under this category is the YOLO architecture (You Only Look Once) which we will explore in more detail now.
The main idea of the YOLO network is to optimise the computation of predictions at various locations in the input image without using any sliding windows.In order to achieve this, the network outputs feature map in form of a grid of size
cells.
Each cell has B*5+C entries. Where "B" is the number of bounding boxes per cell, C is the number of class probabilities and 5 is the elements for each bounding box (x, y :center point coordinates of bounding box with respect to the cell in which it is located , w-width of the bounding box with respect to original image, h-height of the bounding box with respect to original image, confidence...