Summary
There is no doubt that data annotation is a challenge, but with the right tools and techniques, these problems can be minimized and the process streamlined, resulting in a well-labeled dataset that is fit for purpose.
In this chapter, we started by understanding why labeling must be high quality, and what the consequences are of even minor errors. The data labeling process usually begins by getting humans to use their domain expertise, intelligence, sense, and perception to make a decision about data that is unlabeled. We explored the process and key considerations and discussed the options when there is not enough data available. Data labeling is a tedious but necessary process and is prone to errors by the annotators. It is thus important to improve its effectiveness and accuracy by identifying and then following good practices. We then discussed the various ways to label data and their pros and cons. A common technique is to crowdsource, hence we introduced techniques...