Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using linear regression


Linear regression is the approach to model the value of a response variable y, based on one or more predictor variables or feature x.

Getting ready

Let's use some housing data to predict the price of a house based on its size. The following are the sizes and prices of houses in the City of Saratoga, CA, in early 2014:

House size (sq ft)

Price

2100

$ 1,620,000

2300

$ 1,690,000

2046

$ 1,400,000

4314

$ 2,000,000

1244

$ 1,060,000

4608

$ 3,830,000

2173

$ 1,230,000

2750

$ 2,400,000

4010

$ 3,380,000

1959

$ 1,480,000

Here's a graphical representation of the same:

How to do it…

  1. Start the Spark shell:

    $ spark-shell
    
  2. Import the statistics and related classes:

    scala> import org.apache.spark.mllib.linalg.Vectors
    scala> import org.apache.spark.mllib.regression.LabeledPoint
    scala> import org.apache.spark.mllib.regression.LinearRegressionWithSGD
    
  3. Create the LabeledPoint array with the house price as the label:

    scala> val points = Array(
    LabeledPoint(1620000...