Let's now open our Spark shell and build three types of graphs: a directed email communication network, a bipartite graph of ingredient-compound connections, and a multigraph using the previous graph builders.
Note
Unless otherwise stated, we always assume that the Spark shell is launched from the $SPARKHOME
directory. It then becomes the current directory for any relative file path used in this book.
The first graph that we will build is the Enron email communication network. If you have restarted your Spark shell, you need to again import the GraphX library. First, create a new folder called data
inside $SPARKHOME
and copy the dataset into it. This file contains the adjacency list of the email communications between the employees. Assuming that the current directory is $SPARKHOME
, we can pass the file path to the GraphLoader.edgeListFile
method:
scala> import org.apache.spark.graphx._ import org.apache.spark.graphx._ scala> import org.apache...