Book Image

Apache Spark Graph Processing

Book Image

Apache Spark Graph Processing

Overview of this book

Table of Contents (16 chapters)
Apache Spark Graph Processing
Credits
Foreword
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Joining graph datasets


In addition to the previous mapping and filtering operations, GraphX also provides APIs for joining RDD datasets with graphs. This can be useful when we want to add extra information to the vertex attributes of a graph or when we want to merge the vertex attributes of two related graphs. These tasks can be accomplished using the following join operators.

joinVertices

The following is the method signature for the first operator joinVertices:

def joinVertices[U](table: RDD[(VertexId, U)])(map: (VertexId, VD, U) => VD): Graph[VD, ED]

It is invoked on a Graph[VD, ED] object and requires two inputs, which are passed as curried parameters. First, joinVertices joins a graph's vertex attributes with an input vertex RDD table of type RDD[(VertexId, U)]. Second, a user-defined map function is also passed to joinVertices. This map function joins the original and passed attributes of each vertex into a new attribute. The return type of this new attribute must be the same as the...