Common data sources
Chapter 2 introduced the two most common datasets for the computer vision community. They are ImageNet and Microsoft COCO (Common Objects in Context). These datasets also contain many pre-trained models that can predict various class labels that may meet your everyday needs.
If your task is to detect a less common class label, it might be worth exploring the Large Vocabulary Instance Segmentation (LVIS) dataset. It has more than 1,200 categories and 164,000 images, and it contains many rare categories and about 2 million high-quality instance segmentation masks. Detectron2 also provides pre-trained models for predicting these 1,200+ labels. Thus, you can follow the steps described in Chapter 2 to create a computer vision application that meets your needs. More information about the LVIS dataset is available on its website at https://www.lvisdataset.org.
If you have a task where no existing/pre-trained models can meet your needs, it is time to find existing...