Decision tree learning
Decision tree classifiers are attractive models if we care about interpretability. As the name decision tree suggests, we can think of this model as breaking down our data by making a decision based on asking a series of questions.
Let's consider the following example in which we use a decision tree to decide upon an activity on a particular day:
Based on the features in our training set, the decision tree model learns a series of questions to infer the class labels of the samples. Although the preceding figure illustrates the concept of a decision tree based on categorical variables, the same concept applies if our features are real numbers, like in the Iris dataset. For example, we could simply define a cut-off value along the sepal width feature axis and ask a binary question "Is sepal width ≥ 2.8 cm?."
Using the decision algorithm, we start at the tree root and split the data on the feature that results in the largest Information Gain (IG), which will be explained...