We will start out by looking at the implementation of our splitting metrics. Then we'll cover some of our splitting logic, and finally, we'll see how we can wrap the tree so that we can generalize from classification and regression tasks.
Let's go ahead and walk through a classification tree example. We will be using the information gain criteria. In PyCharm there are three scripts open, two of which are metrics.py
and cart.py
, both of which are found inside of the packtml/decision_tree
submodule. Then we have the example_classification_decision_tree.py
file, which is in examples/decision_tree
. Let's start with metrics.
If you open up the cart.py
file, we have an order in which we should step through this so that you can understand how the decision tree class is going to work:
# 1. metrics.InformationGain & metrics.VarianceReduction # 2. RandomSplitter # 3. LeafNode # 4. BaseCART
Starting with the metrics.py
file from the top...