Ensemble learning and random forests
Ensemble learning combines the output from multiple models to obtain a better prediction than could be obtained with any of the models individually. The principle is that the combined accuracy of many weak learners is greater than any of the weak learners taken individually.
Random forests is an ensemble learning algorithm devised and trademarked by Leo Breiman and Adele Cutler. It combines multiple decision trees into one large forest learner. Each tree is trained on the data using a subset of the available features, meaning that each tree will have a slightly different view of the data and is capable of generating a different prediction from that of its peers.
Creating a Random Forest in clj-ml simply requires that we alter the arguments to cl/make-classifier
to :decision-tree
, :random-forest
.
Bagging and boosting
Bagging and boosting are two opposing techniques for creating ensemble models. Boosting is the name for a general technique of building an ensemble...