6.4 RANDOM FORESTS
CART and C5.0 both produce a single decision tree based on all of the records, and the specified variables, in the training data set. There is, however, a method that uses multiple trees, where the output of each tree is considered when determining the final classification of each record.
Random forests 6 build a series of decision trees and combine the trees disparate classifications of each record into one final classification. Random forests are an example of an ensemble method. Ensemble methods are a category of modeling techniques that take multiple models' output into account in order to arrive at a single answer. Different ensemble methods take the models' output into consideration in different ways. For more about ensemble methods, please see our earlier text.7
The random forests algorithm begins building each decision tree by taking a random sample, with replacement, from the original training data set. In this way, each tree will have a different...