-
Book Overview & Buying
-
Table Of Contents
Big Data Analytics
By :
You don't need any additional installation of software to get started with GraphX. GraphX is included within the Spark installation. This section introduces how to create and explore graphs using a simple family relationship graph. The family graph created will be used in all operations within this section.
GraphX does not support the Python API yet. For easy understanding, let's use spark-shell to interactively work with GraphX. First of all, let's create input data (vertex and edge files) needed for our GraphX operations and then store it on HDFS.
All programs in this chapter are executed on CDH 5.8 VM. For other environments, file paths might change, but the concepts are the same in any environment.
We can create a graph using the following steps:
Create a vertex file with vertex ID, name, and age as shown here:
[cloudera@quickstart ~]$ cat vertex.csv 1,Jacob,48 2,Jessica,45 3,Andrew,25 4,Ryan,53 5,Emily,22 6,Lily,52...