Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Creating a labeled point


Labeled point is a local vector (sparse/dense), which has an associated label with it. Labeled data is used in supervised learning to help train algorithms. You will get to know more about it in the next chapter.

Label is stored as a double value in LabeledPoint. It means that when you have categorical labels, they need to be mapped to double values. What value you assign to a category is immaterial and is only a matter of convenience.

Type

Label values

Binary classification

0 or 1

Multiclass classification

0, 1, 2…

Regression

Decimal values

How to do it…

  1. Start the Spark shell:

    $spark-shell
    
  2. Import the MLlib vector explicitly:

    scala> import org.apache.spark.mllib.linalg.{Vectors,Vector}
    
  3. Import the LabeledPoint:

    scala> import org.apache.spark.mllib.regression.LabeledPoint
    
  4. Create a labeled point with a positive label and dense vector:

    scala> val willBuySUV = LabeledPoint(1.0,Vectors.dense(300.0,80,40))
    
  5. Create a labeled point with a negative label and dense...