Labeled point is a local vector (sparse/dense), which has an associated label with it. Labeled data is used in supervised learning to help train algorithms. You will get to know more about it in the next chapter.
Label is stored as a double value in LabeledPoint
. It means that when you have categorical labels, they need to be mapped to double values. What value you assign to a category is immaterial and is only a matter of convenience.
Type |
Label values |
---|---|
Binary classification |
0 or 1 |
Multiclass classification |
0, 1, 2… |
Regression |
Decimal values |
Start the Spark shell:
$spark-shell
Import the MLlib vector explicitly:
scala> import org.apache.spark.mllib.linalg.{Vectors,Vector}
Import the
LabeledPoint
:scala> import org.apache.spark.mllib.regression.LabeledPoint
Create a labeled point with a positive label and dense vector:
scala> val willBuySUV = LabeledPoint(1.0,Vectors.dense(300.0,80,40))