Book Image

Apache Spark Graph Processing

Book Image

Apache Spark Graph Processing

Overview of this book

Table of Contents (16 chapters)
Apache Spark Graph Processing
Credits
Foreword
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Data operations on VertexRDD and EdgeRDD


All of the operations we've seen previously are graph operations. They are invoked on a graph and they return a new graph object. In this section, we will introduce operations that transform VertexRDD and EdgeRDD collections. The types of these collections are subtypes of RDD[(VertexID, VD)] and RDD[Edge[ED]] respectively.

Mapping VertexRDD and EdgeRDD

First, mapValues takes a map function as input, which transforms each vertex attribute in the VertexRDD. Then, it returns a new VertexRDD object while preserving the original vertex indices. The method mapValues is overloaded so that the map function can take an input with a type VD or (VertexId, VD). The type of the new vertex attributes can be different to VD:

def mapValues[VD2](map: VD => VD2): VertexRDD[VD2]
def mapValues[VD2](map: (VertexId, VD) => VD2): VertexRDD[VD2]

For illustration, let's take the biographies of the previous Hollywood stars in a VertexRDD collection:

scala> val actorsBio...