Graph Data Modeling in Python

By : Gary Hutson, Matt Jackson

Graph Data Modeling in Python

By: Gary Hutson, Matt Jackson

Overview of this book

Graphs have become increasingly integral to powering the products and services we use in our daily lives, driving social media, online shopping recommendations, and even fraud detection. With this book, you’ll see how a good graph data model can help enhance efficiency and unlock hidden insights through complex network analysis. Graph Data Modeling in Python will guide you through designing, implementing, and harnessing a variety of graph data models using the popular open source Python libraries NetworkX and igraph. Following practical use cases and examples, you’ll find out how to design optimal graph models capable of supporting a wide range of queries and features. Moreover, you’ll seamlessly transition from traditional relational databases and tabular data to the dynamic world of graph data structures that allow powerful, path-based analyses. As well as learning how to manage a persistent graph database using Neo4j, you’ll also get to grips with adapting your network model to evolving data requirements. By the end of this book, you’ll be able to transform tabular data into powerful graph data models. In essence, you’ll build your knowledge from beginner to advanced-level practitioner in no time.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share your thoughts

Download a free PDF copy of this book

Part 1: Getting Started with Graph Data Modeling

Free Chapter

Chapter 1: Introducing Graphs in the Real World

Technical requirements

Why should you use graphs?

The fundamentals of nodes and edges and the properties of a graph

Comparing RDBs and GDBs

The use of graphs across various industries

Introduction to NetworkX and igraph

Summary

Chapter 2: Working with Graph Data Models

Technical requirements

Making the transition from tabular to graph data

Implementing the model in Python

Summary

Part 2: Making the Graph Transition

Chapter 3: Data Model Transformation – Relational to Graph Databases

Technical requirements

Recommending a game to a user

From relational to graph databases

Ingestion considerations

Our recommendation system

Summary

Chapter 4: Building a Knowledge Graph

Technical requirements

Introducing knowledge graphs

Cleaning the data for our knowledge graph

Ingesting data into a knowledge graph

Knowledge graph analysis and community detection

Summary

Part 3: Storing and Productionizing Graphs

Chapter 5: Working with Graph Databases

Technical requirements

Using graph databases

Storing a graph in Neo4j

Optimizing travel with Python and Cypher

Moving to ingestion pipelines

Summary

Chapter 6: Pipeline Development

Technical requirements

Graph pipeline development

Designing a schema and pipeline

Making product recommendations

Summary

Chapter 7: Refactoring and Evolving Schemas

Technical requirements

Refactoring reasoning

Effectively evolving with graph schema design

Putting the changes into development

Summary

Part 4: Graphing Like a Pro

Chapter 8: Perfect Projections

Technical requirements

What are projections?

How to use a projection

Putting the projection to work

Summary

Chapter 9: Common Errors and Debugging

Technical requirements

Debugging graph issues

Common igraph issues

Common Neo4j issues

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share your thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Introduction to NetworkX and igraph

In this chapter, we will introduce two Python packages for creating in-memory graphs: NetworkX and igraph.

NetworkX lets you create graphs, perform graph manipulation, study and visualize their structures, and perform several graph manipulation functions when working with graphs. Their website (https://networkx.org/) contains details of the major changes to the package and the intended usage of the tool.

igraph contains a suite of useful and practical analysis tools, with the aim being to make these efficient and easy to use, in a reproducible way. What is great about igraph is that it is open source and free, plus it supports networks to be built in R, Python, Mathematica, and C/C++. This is our recommended package for creating large networks that can load much more quickly than NetworkX. To read more about igraph, go to https://igraph.org/.

In the following subsections, we will look at the basics of both NetworkX and igraph, with easy-to-follow coding steps. This is the first time you are going to get your hands dirty with graph data modeling.

NetworkX basics

NetworkX is one of the originally available graph libraries for Python and is particularly focused on being user-friendly and Pythonic in its style. It also natively includes methods for calculating some classic network analysis measures:

To import NetworkX into Python, use the following command:
```
import networkx as nx
```
And to create an empty graph, g, use the following command:
```
g = nx.Graph()
```
Now, we need to add nodes to our graph, which can be done using methods of the Graph object belonging to g. There are multiple ways to do this, with the simplest being adding one node at a time:
```
g.add_node(Jeremy)
```
Alternatively, multiple nodes can be added to the graph at once, like so:
```
g.add_nodes_from([Mark, Jeremy])
```
Properties can be added to nodes during creation by passing a node and dictionary tuple to Graph.add_nodes_from:
```
g.add_nodes_from([(Mark, {followers: 2100}), (Jeremy, {followers: 130})])
```
To add an edge to the graph, we can use the Graph.add_edge method, and reference the nodes already present in the graph:
```
g.add_edge(Jeremy, Mark)
```

It is worth noting that, in NetworkX, when adding an edge, any nodes specified as part of that edge not already in the graph will be added implicitly.

To confirm that our graph now contains nodes and edges, we may want to plot it, using matplotlib and networkx.draw(). The with_labels parameter adds the names of the nodes to the plot:
```
import matplotlib.pyplot as plt
nx.draw(g, with_labels=True)
plt.show()
```

This section showed you how you can get up and running with NetworkX in a couple of lines of Python code. In the next section, we will turn our focus to the popular igraph package, which allows us to perform calculations over larger datasets much quicker than using the popular NetworkX.

igraph basics

NetworkX, while user-friendly, suffers from slow speeds when using larger graphs. This is due to its implementation behind the scenes and because it is written in Python, with some C, C++, and FORTRAN.

In contrast, igraph is implemented in pure C, giving the library an advantage when working with large graphs and complex network algorithms. While not as immediately accessible as NetworkX for beginners, igraph is a useful tool to have under your belt when code efficiency is paramount.

Initially, working with igraph is very similar to working with NetworkX. Let’s take a look:

To import igraph into Python, use the following command:
```
import igraph as ig
```
And to create an empty graph, g, use the following command:
```
g = ig.Graph()
```

In contrast to NetworkX, in igraph, all nodes have a prescribed internal integer ID. The first node that’s added has an ID of 0, with all subsequent nodes assigned increasing integer IDs.

Similar to NetworkX, changes can be made to a graph by using the methods of a Graph object. Nodes can be added to the graph with the Graph.add_vertices method (note that a vertex is another way to refer to a node). Two nodes can be added to the graph with the following code:
```
g.add_vertices(2)
```
This will add nodes 0 and 1 to the graph. To name them, we have to assign properties to the nodes. We can do this by accessing the vertices of the Graph object. Similar to how you would access elements of a list, each node’s properties can be accessed by using the following notation. Here, we are setting the name and followers attributes of nodes 0 and 1:
```
g.vs[0][name] = Jeremy
g.vs[1][name] = Mark
g.vs[0][followers] = 130
g.vs[1][followers] = 2100
```
Node properties can also be added listwise, where the first list element corresponds to node ID 0, the second to node ID 1, and so on. The following two lines are equivalent to the four lines shown in step 4:
```
g.vs["name"] = [Jeremy, Mark]
g.vs[followers] = [130, 2100]
```
To add an edge, we can use the Graph.add_edges() method:
```
g.add_edges([(0, 1)])
```

Here, we are only adding one edge, but additional edges can be added to the list parameter required by add_edges. As with NetworkX, if edges are added for nodes that are not currently in the graph, nodes will be created implicitly. However, since igraph requires nodes to have sequential IDs, attempting to add the edge pair (1, 3) to a graph with two vertices will fail.

Graph Data Modeling in Python

By : Gary Hutson, Matt Jackson

Graph Data Modeling in Python

By: Gary Hutson, Matt Jackson

Overview of this book

Related Content you might be interested in

Current Title:

Graph Data Modeling in Python

Graph Data Science with Neo4j

Graph Data Processing with Cypher

Hands-On Graph Analytics with Neo4j

Introduction to NetworkX and igraph

NetworkX basics

igraph basics