An alternate way to lasso to improve prediction quality is ridge regression. While in lasso, a lot of features get their coefficients set to zero and, therefore, eliminated from an equation, in ridge, predictors or features are penalized, but are never set to zero.
Start the Spark shell:
$ spark-shell
Import the statistics and related classes:
scala> import org.apache.spark.mllib.linalg.Vectors scala> import org.apache.spark.mllib.regression.LabeledPoint scala> import org.apache.spark.mllib.regression.RidgeRegressionWithSGD
Create the
LabeledPoint
array with the house price as the label:scala> val points = Array( LabeledPoint(1,Vectors.dense(5,3,1,2,1,3,2,2,1)), LabeledPoint(2,Vectors.dense(9,8,8,9,7,9,8,7,9)) )
Create an RDD of the preceding data:
scala> val rdd = sc.parallelize(points)
Train a model using this data using 100 iterations. Here, the step size and regularization parameter have been set by hand :
scala> val model = RidgeRegressionWithSGD...