Apache Spark Graph Processing

Let's now open our Spark shell and build three types of graphs: a directed email communication network, a bipartite graph of ingredient-compound connections, and a multigraph using the previous graph builders.

Note

Unless otherwise stated, we always assume that the Spark shell is launched from the $SPARKHOME directory. It then becomes the current directory for any relative file path used in this book.

Building directed graphs

The first graph that we will build is the Enron email communication network. If you have restarted your Spark shell, you need to again import the GraphX library. First, create a new folder called data inside $SPARKHOME and copy the dataset into it. This file contains the adjacency list of the email communications between the employees. Assuming that the current directory is $SPARKHOME, we can pass the file path to the GraphLoader.edgeListFile method:

scala> import org.apache.spark.graphx._
import org.apache.spark.graphx._

scala> import org.apache...

Apache Spark Graph Processing

Apache Spark Graph Processing

Overview of this book

Related Content you might be interested in

Current Title:

Apache Spark Graph Processing

Building graphs

Note

Building directed graphs