A very common learner, recently used very much due to its speed, is the regression tree. It's a non-linear learner, can work with both categorical and numerical features, and can be used alternately for classification or regression; that's why it's often called Classification and Regression Tree (CART). Here, in this section, we will see how regression trees work.
A tree is composed of a series of nodes that split the branch into two children. Each branch, then, can go in another node, or remain a leaf with the predicted value (or class).
Starting from the root (that is, the whole dataset):
The best feature with which to split the dataset, F1, is identified as well as the best splitting value. If the feature is numerical, the splitting value is a threshold T1: in this case, the left child branch will be the set of observations where F1 is below T1, and the right one is the set of observations where F1 is greater than, or equal to, T1. If the feature is categorical, the...