Book Image

Hands-On Graph Analytics with Neo4j

By : Estelle Scifo
Book Image

Hands-On Graph Analytics with Neo4j

By: Estelle Scifo

Overview of this book

Neo4j is a graph database that includes plugins to run complex graph algorithms. The book starts with an introduction to the basics of graph analytics, the Cypher query language, and graph architecture components, and helps you to understand why enterprises have started to adopt graph analytics within their organizations. You’ll find out how to implement Neo4j algorithms and techniques and explore various graph analytics methods to reveal complex relationships in your data. You’ll be able to implement graph analytics catering to different domains such as fraud detection, graph-based search, recommendation systems, social networking, and data management. You’ll also learn how to store data in graph databases and extract valuable insights from it. As you become well-versed with the techniques, you’ll discover graph machine learning in order to address simple to complex challenges using Neo4j. You will also understand how to use graph data in a machine learning model in order to make predictions based on your data. Finally, you’ll get to grips with structuring a web application for production using Neo4j. By the end of this book, you’ll not only be able to harness the power of graphs to handle a broad range of problem areas, but you’ll also have learned how to use Neo4j efficiently to identify complex relationships in your data.
Table of Contents (18 chapters)
1
Section 1: Graph Modeling with Neo4j
5
Section 2: Graph Algorithms
10
Section 3: Machine Learning on Graphs
14
Section 4: Neo4j for Production

Neo4j – the nodes, relationships, and properties model

Neo4j is a Java-based highly scalable graph database whose code is publicly available on GitHub at github.com/neo4j/neo4j. This section describes its building blocks and its main cases, but also the cases where it won't perform well because no system is perfect.

Building blocks

As we discussed with graph theory, a graph database such as Neo4j is made of at least two essential building blocks:

  • Nodes
  • Relationships between nodes:
Unlike SQL, Neo4j does not require a fixed and predetermined schema: nodes and relationships can be added on demand.

Let's look at each of of these in detail.

Nodes

In Neo4j, vertices are called nodes. Nodes can be of different types, like Question and Answer were in our former example. In order to differentiate those entities, nodes can have a label. If we continue the parallel with SQL, all nodes with a given label would be in the same table. But the analogy ends here, because nodes can have multiple labels. For instance, Clark Kent is a journalist, but he is also a superhero; the node representing this person can then have the two labels: Journalist and SuperHero.

Labels define the kind of entity the node belongs to. It is also important to store the characteristics of that entity. In Neo4j, this is done by attaching properties to nodes.

Relationships

Like nodes, relationships carry different pieces of information. A relationship between two persons can be of type MARRIED_TO or FRIEND_WITH or many other types of relationship. That's why Neo4j's relationships must have one and only one type.

One of the main powers of Neo4j is that relationships also have properties. For instance, when adding the relationship MARRIED_TO between two persons, we could add the wedding date, place, whether they signed a prenuptial agreement, and so on, as relationship properties.

Be careful: even though we discussed undirected graphs earlier, in Neo4j, all relationships are oriented!

Properties

Properties are saved as key-value pairs where the key is a string capturing the property name. Each value can then be of any of the following types:

  • Number: Integer or Float
  • Text: String
  • Boolean: Boolean
  • Time properties: Date, Time, DateTime, LocalTime, LocalDateTime, or Duration
  • Spatial: Point

Internally, properties are saved as a LinkedList, each element containing a key-value pair. The node or relationship is then linked to the first element of its property list.

Naming conventions: Node labels are written in UpperCamelCase while relationship names use UPPER_CASE with words separated by an underscore. Property names usually use the lowerCamelCase convention.

SQL to Neo4j translator

Here are a few guidelines to be able to easily go from your relational model to a graph model:

SQL world Neo4j world
Table Node label
Row Node
Column Node property
Foreign key Relationship
Join table Relationship
NULL Do not store null values inside properties; just omit the property

Applying those guidelines to the SQL model in the question and answer table diagram, we built up the graph model displayed in the whiteboard model for our simple Q&A website, earlier in the chapter. The full graph model can be seen as follows:

You can use the online tool Arrows, which enables you to draw a graph diagram and export it to Cypher, the Neo4j query language:
http://www.apcjones.com/arrows.

Neo4j use cases

Like any other tool, Neo4j is very good in some situations, but not well suited to others. The basic principle is that Neo4j provides amazing performance for graph traversal. Everything requiring jumping from one node to another is incredibly fast.

On the other hand, Neo4j is probably not the best tool if you want to do the following:

  • Perform full DB scans, for instance, answering the question "What is?"
  • Do full table aggregates
  • Store large documents: the key-values properties list needs to be kept small (let's say no more than around 10 properties)

Some of those pain points can be addressed with a proper graph model. For instance, instead of saving all information as node properties, can we consider moving some of them to another node with a relationship between them? Depending on the kind of requests you are interested in, the graph schema most suited to your application may differ. Before going into the details of graph modeling, we need to stop briefly and and talk about the different kinds of graph properties which will also influence our choice.