One way to classify documents is to follow a
set of rules down through a tree finally to place an instance into a bucket. This is essentially what decision trees do. They are especially good at classifying nominal data (discrete categories of data, such as the
species
attribute of the Iris dataset), where statistics designed for working with numerical data—such as K-means clustering—don't work as well.
Decision trees have another handy feature. Unlike many types of data mining where the analysis is somewhat of a black box, decision trees are very intelligible. We can examine them easily and readily tell how and why they classify our data the way they do.
In this recipe, we'll look at a dataset of mushrooms and create a decision tree to tell us if an instance is edible or poisonous.