Another ensemble learning algorithm is gradient boosted trees (GBTs). GBTs train one tree at a time, where each new tree improves upon the shortcomings of the previously trained trees.
As GBTs train one tree at a time, they can take longer than random forest.
- Start the Spark shell:
$ spark-shell
- Perform the required imports:
scala> import org.apache.spark.ml.classification.{GBTClassificationModel, GBTClassifier} scala> import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
- Load and parse the data:
scala> val data =
spark.read.format("libsvm").load("s3a://sparkcookbook/patientdata")
- Split the data into
training
andtest
datasets:
scala> val Array(training, test) = data.randomSplit(Array(0.7, 0.3))
- Create a classification as a boosting strategy and set the number of iterations to
3
:
scala>...