Book Image

Hands-On Graph Analytics with Neo4j

By : Estelle Scifo
Book Image

Hands-On Graph Analytics with Neo4j

By: Estelle Scifo

Overview of this book

Neo4j is a graph database that includes plugins to run complex graph algorithms. The book starts with an introduction to the basics of graph analytics, the Cypher query language, and graph architecture components, and helps you to understand why enterprises have started to adopt graph analytics within their organizations. You’ll find out how to implement Neo4j algorithms and techniques and explore various graph analytics methods to reveal complex relationships in your data. You’ll be able to implement graph analytics catering to different domains such as fraud detection, graph-based search, recommendation systems, social networking, and data management. You’ll also learn how to store data in graph databases and extract valuable insights from it. As you become well-versed with the techniques, you’ll discover graph machine learning in order to address simple to complex challenges using Neo4j. You will also understand how to use graph data in a machine learning model in order to make predictions based on your data. Finally, you’ll get to grips with structuring a web application for production using Neo4j. By the end of this book, you’ll not only be able to harness the power of graphs to handle a broad range of problem areas, but you’ll also have learned how to use Neo4j efficiently to identify complex relationships in your data.
Table of Contents (18 chapters)
1
Section 1: Graph Modeling with Neo4j
5
Section 2: Graph Algorithms
10
Section 3: Machine Learning on Graphs
14
Section 4: Neo4j for Production

Moving from SQL to graph databases

Before going into detail about Neo4j, let's first have a look at the existing database models. We'll then focus on the most famous one, the relational model, and learn how to migrate from SQL to a graph database model.

Database models

Think about water: depending on the end goal, you will not use the same container. If you want to drink, you might use a glass of water but if you want to have a bath, you will probably choose another one. The choice of container will be different in each scenario. The problem is similar for data: without a proper container, there is nothing we can do with it and we need a proper container depending on the situation, which will not only store data but also contribute to solve the problem we have. This data container is the database.

Drawing an exhaustive list of database types on the market is not impossible but would go far beyond the scope of this book. However, I want to give some examples of the most popular ones, so that you can see where graph databases stand in the big picture:

  • Relational databases: They are by far the most well known type of database. From SQLite to MySQL or PostgreSQL, they use a common query language, called Structured Query Language (SQL), with some variations among the different implementations. They are well established and allow a clear structure of the data. However, they suffer from performance issues when the data grows and are surprisingly not that good at managing complex relationships, since the relationships require many joins between tables.
  • Document-oriented databases: Document-oriented databases, part of the NoSQL (Not Only SQL) era, have gained increasing interest during the last few years. Contrary to relational databases, they can manage flexible data models and are known for better scaling with a large amount of data. Examples of NoSQL databases include MongoDB and Cassandra, but you can find many more on the market.
  • Key-value stores: Redis, RocksDB, and Amazon DynamoDB are examples of key-value databases. They are very simple and known to be very fast, but are not well suited to store complex data.

Here is how the different databases can be viewed in a figurative representation:

Graph databases try to bring the best of each world into a single place by being quite simple to use, flexible, and very performant when it comes to relationships.

SQL and joins

Let's briefly focus on relational databases. Imagine we want to create a Question and Answer website (similar to the invaluable Stack Overflow). The requirements are the following:

  • Users should be able to log in.
  • Once logged in, users can post questions.
  • Users can post answers to existing questions.
  • Questions need to have tags to better identify which question is relevant for which user.

As developers or data scientists, who are used to SQL, we would then naturally start thinking in terms of tables. Which table(s) should I create to store this data? Well, first we will look for entities that seem to be the core of the business. In this QA website, we can identify the following:

  • Users, with attributes: ID, name, email, password
  • Questions: ID, title, text
  • Answers: ID, text
  • Tags: ID, text

With those entities, we now need to create the relationships between them. To do so, we can use foreign keys. For instance, the question has been asked by a given user, so we can just add a new column to the question table, author_id, referencing the user table. The same applies for the answers: they are written by a given user, so we add an author_id column to the Answer table:

It becomes more complicated for tags, since one question can have multiple tags and a single tag can be assigned to many questions. We are in the many-to-many relationship type, which requires adding a join table, a table that is just there to remember that kind of relationship. This is the case of the QuestionTag table in the preceding diagram; it just holds the relationship between tags and questions.

It's all about relationships

The previous exercise probably looks easy to you, because you have followed lectures or tutorials about SQL and use it pretty often in your daily work. But let's be honest, the first time you face a problem and have to create the data model that will allow you to solve it, you will probably draw something that looks like the diagram in the following image, right?

This is one of the powers of graph databases:

Neo4j, as a graph database, allows you to create vertices, or nodes, and the relationships connecting them. The next section summarizes how the different entities in the Neo4j ecosystem are related to each other and how we can structure our data in a graph.

The whiteboard model is your data model.