Another ensemble learning algorithm is Gradient Boosted Trees (GBTs). GBTs train one tree at a time, where each new tree improves upon the shortcomings of previously trained trees.
As GBTs train one tree at a time, they can take longer than Random Forest.
Start the Spark shell:
$ spark-shell
Perform the required imports:
scala> import org.apache.spark.mllib.tree.GradientBoostedTrees scala> import org.apache.spark.mllib.tree.configuration.BoostingStrategy scala> import org.apache.spark.mllib.util.MLUtils
Load and parse the data:
scala> val data = MLUtils.loadLibSVMFile(sc, "rf_libsvm_data.txt")
Split the data into
training
andtest
datasets:scala> val splits = data.randomSplit(Array(0.7, 0.3)) scala> val (trainingData, testData) = (splits(0), splits(1))
Create a classification as a boosting strategy and set the number of iterations...