Implementing a decision tree classifier
A decision tree is a model for classifying data effectively. Each child of a node in the tree represents a feature about the item we are classifying. Traversing down the tree to leaf nodes represent an item's classification. It's often desirable to create the smallest possible tree to represent a large sample of data.
In this recipe, we implement the ID3 decision tree algorithm in Haskell. It is one of the easiest to implement and produces useful results. However, ID3 does not guarantee an optimal solution, may be computationally inefficient compared to other algorithms, and only supports discrete data. While these issues can be addressed by a more complicated algorithm such as C4.5, the code in this recipe is enough to get up and running with a working decision tree.
Getting ready
Create a CSV file representing samples of data. The last column should be the classification. Name this file input.csv
.
The weather data is represented with four attributes...