Book Image

Graph Data Modeling in Python

By : Gary Hutson, Matt Jackson
Book Image

Graph Data Modeling in Python

By: Gary Hutson, Matt Jackson

Overview of this book

Graphs have become increasingly integral to powering the products and services we use in our daily lives, driving social media, online shopping recommendations, and even fraud detection. With this book, you’ll see how a good graph data model can help enhance efficiency and unlock hidden insights through complex network analysis. Graph Data Modeling in Python will guide you through designing, implementing, and harnessing a variety of graph data models using the popular open source Python libraries NetworkX and igraph. Following practical use cases and examples, you’ll find out how to design optimal graph models capable of supporting a wide range of queries and features. Moreover, you’ll seamlessly transition from traditional relational databases and tabular data to the dynamic world of graph data structures that allow powerful, path-based analyses. As well as learning how to manage a persistent graph database using Neo4j, you’ll also get to grips with adapting your network model to evolving data requirements. By the end of this book, you’ll be able to transform tabular data into powerful graph data models. In essence, you’ll build your knowledge from beginner to advanced-level practitioner in no time.
Table of Contents (16 chapters)
1
Part 1: Getting Started with Graph Data Modeling
4
Part 2: Making the Graph Transition
7
Part 3: Storing and Productionizing Graphs
11
Part 4: Graphing Like a Pro

Introduction to NetworkX and igraph

In this chapter, we will introduce two Python packages for creating in-memory graphs: NetworkX and igraph.

NetworkX lets you create graphs, perform graph manipulation, study and visualize their structures, and perform several graph manipulation functions when working with graphs. Their website (https://networkx.org/) contains details of the major changes to the package and the intended usage of the tool.

igraph contains a suite of useful and practical analysis tools, with the aim being to make these efficient and easy to use, in a reproducible way. What is great about igraph is that it is open source and free, plus it supports networks to be built in R, Python, Mathematica, and C/C++. This is our recommended package for creating large networks that can load much more quickly than NetworkX. To read more about igraph, go to https://igraph.org/.

In the following subsections, we will look at the basics of both NetworkX and igraph, with easy-to-follow coding steps. This is the first time you are going to get your hands dirty with graph data modeling.

NetworkX basics

NetworkX is one of the originally available graph libraries for Python and is particularly focused on being user-friendly and Pythonic in its style. It also natively includes methods for calculating some classic network analysis measures:

  1. To import NetworkX into Python, use the following command:
    import networkx as nx
  2. And to create an empty graph, g, use the following command:
    g = nx.Graph()
  3. Now, we need to add nodes to our graph, which can be done using methods of the Graph object belonging to g. There are multiple ways to do this, with the simplest being adding one node at a time:
    g.add_node(Jeremy)
  4. Alternatively, multiple nodes can be added to the graph at once, like so:
    g.add_nodes_from([Mark, Jeremy])
  5. Properties can be added to nodes during creation by passing a node and dictionary tuple to Graph.add_nodes_from:
    g.add_nodes_from([(Mark, {followers: 2100}), (Jeremy, {followers: 130})])
  6. To add an edge to the graph, we can use the Graph.add_edge method, and reference the nodes already present in the graph:
    g.add_edge(Jeremy, Mark)

It is worth noting that, in NetworkX, when adding an edge, any nodes specified as part of that edge not already in the graph will be added implicitly.

  1. To confirm that our graph now contains nodes and edges, we may want to plot it, using matplotlib and networkx.draw(). The with_labels parameter adds the names of the nodes to the plot:
    import matplotlib.pyplot as plt
    nx.draw(g, with_labels=True)
    plt.show()

This section showed you how you can get up and running with NetworkX in a couple of lines of Python code. In the next section, we will turn our focus to the popular igraph package, which allows us to perform calculations over larger datasets much quicker than using the popular NetworkX.

igraph basics

NetworkX, while user-friendly, suffers from slow speeds when using larger graphs. This is due to its implementation behind the scenes and because it is written in Python, with some C, C++, and FORTRAN.

In contrast, igraph is implemented in pure C, giving the library an advantage when working with large graphs and complex network algorithms. While not as immediately accessible as NetworkX for beginners, igraph is a useful tool to have under your belt when code efficiency is paramount.

Initially, working with igraph is very similar to working with NetworkX. Let’s take a look:

  1. To import igraph into Python, use the following command:
    import igraph as ig
  2. And to create an empty graph, g, use the following command:
    g = ig.Graph()

In contrast to NetworkX, in igraph, all nodes have a prescribed internal integer ID. The first node that’s added has an ID of 0, with all subsequent nodes assigned increasing integer IDs.

  1. Similar to NetworkX, changes can be made to a graph by using the methods of a Graph object. Nodes can be added to the graph with the Graph.add_vertices method (note that a vertex is another way to refer to a node). Two nodes can be added to the graph with the following code:
    g.add_vertices(2)
  2. This will add nodes 0 and 1 to the graph. To name them, we have to assign properties to the nodes. We can do this by accessing the vertices of the Graph object. Similar to how you would access elements of a list, each node’s properties can be accessed by using the following notation. Here, we are setting the name and followers attributes of nodes 0 and 1:
    g.vs[0][name] = Jeremy
    g.vs[1][name] = Mark
    g.vs[0][followers] = 130
    g.vs[1][followers] = 2100
  3. Node properties can also be added listwise, where the first list element corresponds to node ID 0, the second to node ID 1, and so on. The following two lines are equivalent to the four lines shown in step 4:
    g.vs["name"] = [Jeremy, Mark]
    g.vs[followers] = [130, 2100]
  4. To add an edge, we can use the Graph.add_edges() method:
    g.add_edges([(0, 1)])

Here, we are only adding one edge, but additional edges can be added to the list parameter required by add_edges. As with NetworkX, if edges are added for nodes that are not currently in the graph, nodes will be created implicitly. However, since igraph requires nodes to have sequential IDs, attempting to add the edge pair (1, 3) to a graph with two vertices will fail.