Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Doing ridge regression


An alternate way to lasso to improve prediction quality is ridge regression. While in lasso, a lot of features get their coefficients set to zero and, therefore, eliminated from an equation, in ridge, predictors or features are penalized, but are never set to zero.

How to do it…

  1. Start the Spark shell:

    $ spark-shell
    
  2. Import the statistics and related classes:

    scala> import org.apache.spark.mllib.linalg.Vectors
    scala> import org.apache.spark.mllib.regression.LabeledPoint
    scala> import org.apache.spark.mllib.regression.RidgeRegressionWithSGD
    
  3. Create the LabeledPoint array with the house price as the label:

    scala> val points = Array(
    LabeledPoint(1,Vectors.dense(5,3,1,2,1,3,2,2,1)),
    LabeledPoint(2,Vectors.dense(9,8,8,9,7,9,8,7,9))
    )
    
  4. Create an RDD of the preceding data:

    scala> val rdd = sc.parallelize(points)
    
  5. Train a model using this data using 100 iterations. Here, the step size and regularization parameter have been set by hand :

    scala> val model = RidgeRegressionWithSGD...