The goal of classification is to create a model that can predict classes in never-before-seen data. This means that the model should generalize beyond the training data. As the data we work with is supervised, meaning that we already know the answer to our question (does the applicant have a good or bad credit rating?), it is rarely interesting to have a model that can merely repeat that. Instead, we need the model to classify the unlabeled data we gather in the future. We will discuss this further when covering under- and overfitting.
The datasets being used are the following:
Figure 4.1: Datasets
In binary classification, our question is of the either or type. There are only two options, either yes or no; not maybe and not both yes and no. In multiclass classification, we can have more than two options, though we can only choose one of them. In multilabel classification, it is possible to predict both yes and no at the same time, hence an...