Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Doing classification using Gradient Boosted Trees


Another ensemble learning algorithm is Gradient Boosted Trees (GBTs). GBTs train one tree at a time, where each new tree improves upon the shortcomings of previously trained trees.

As GBTs train one tree at a time, they can take longer than Random Forest.

Getting ready

We are going to use the same data we used in the previous recipe.

How to do it…

  1. Start the Spark shell:

    $ spark-shell
    
  2. Perform the required imports:

    scala> import org.apache.spark.mllib.tree.GradientBoostedTrees
    scala> import org.apache.spark.mllib.tree.configuration.BoostingStrategy
    scala> import org.apache.spark.mllib.util.MLUtils
    
  3. Load and parse the data:

    scala> val data =
      MLUtils.loadLibSVMFile(sc, "rf_libsvm_data.txt")
    
  4. Split the data into training and test datasets:

    scala> val splits = data.randomSplit(Array(0.7, 0.3))
    scala> val (trainingData, testData) = (splits(0), splits(1))
    
  5. Create a classification as a boosting strategy and set the number of iterations...