Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Creating matrices


Matrix is simply a table to represent multiple feature vectors. A matrix that can be stored on one machine is called local matrix and the one that can be distributed across the cluster is called distributed matrix.

Local matrices have integer-based indices, while distributed matrices have long-based indices. Both have values as doubles.

There are three types of distributed matrices:

  • RowMatrix: This has each row as a feature vector.

  • IndexedRowMatrix: This also has row indices.

  • CoordinateMatrix: This is simply a matrix of MatrixEntry. A MatrixEntry represents an entry in the matrix represented by its row and column index.

How to do it…

  1. Start the Spark shell:

    $spark-shell
    
  2. Import the matrix-related classes:

    scala> import org.apache.spark.mllib.linalg.{Vectors,Matrix, Matrices}
    
  3. Create a dense local matrix:

    scala> val people = Matrices.dense(3,2,Array(150d,60d,25d, 300d,80d,40d))
    
  4. Create a personRDD as RDD of vectors:

    scala> val personRDD = sc.parallelize(List(Vectors.dense...