Apache Spark Graph Processing

All of the operations we've seen previously are graph operations. They are invoked on a graph and they return a new graph object. In this section, we will introduce operations that transform VertexRDD and EdgeRDD collections. The types of these collections are subtypes of RDD[(VertexID, VD)] and RDD[Edge[ED]] respectively.

Mapping VertexRDD and EdgeRDD

First, mapValues takes a map function as input, which transforms each vertex attribute in the VertexRDD. Then, it returns a new VertexRDD object while preserving the original vertex indices. The method mapValues is overloaded so that the map function can take an input with a type VD or (VertexId, VD). The type of the new vertex attributes can be different to VD:

def mapValues[VD2](map: VD => VD2): VertexRDD[VD2]
def mapValues[VD2](map: (VertexId, VD) => VD2): VertexRDD[VD2]

For illustration, let's take the biographies of the previous Hollywood stars in a VertexRDD collection:

scala> val actorsBio...

Apache Spark Graph Processing

Apache Spark Graph Processing

Overview of this book

Related Content you might be interested in

Current Title:

Apache Spark Graph Processing

Data operations on VertexRDD and EdgeRDD

Mapping VertexRDD and EdgeRDD