Matrix is simply a table to represent multiple feature vectors. A matrix that can be stored on one machine is called local matrix and the one that can be distributed across the cluster is called distributed matrix.
Local matrices have integer-based indices, while distributed matrices have long-based indices. Both have values as doubles.
There are three types of distributed matrices:
$spark-shell
Import the matrix-related classes:
scala> import org.apache.spark.mllib.linalg.{Vectors,Matrix, Matrices}
Create a dense local matrix:
scala> val people = Matrices.dense(3,2,Array(150d,60d,25d, 300d,80d,40d))
Create a
personRDD
as RDD of vectors:scala> val personRDD = sc.parallelize(List(Vectors.dense...