Book Image

Graph Data Science with Neo4j

By : Estelle Scifo
5 (1)
Book Image

Graph Data Science with Neo4j

5 (1)
By: Estelle Scifo

Overview of this book

Neo4j, along with its Graph Data Science (GDS) library, is a complete solution to store, query, and analyze graph data. As graph databases are getting more popular among developers, data scientists are likely to face such databases in their career, making it an indispensable skill to work with graph algorithms for extracting context information and improving the overall model prediction performance. Data scientists working with Python will be able to put their knowledge to work with this practical guide to Neo4j and the GDS library that offers step-by-step explanations of essential concepts and practical instructions for implementing data science techniques on graph data using the latest Neo4j version 5 and its associated libraries. You’ll start by querying Neo4j with Cypher and learn how to characterize graph datasets. As you get the hang of running graph algorithms on graph data stored into Neo4j, you’ll understand the new and advanced capabilities of the GDS library that enable you to make predictions and write data science pipelines. Using the newly released GDSL Python driver, you’ll be able to integrate graph algorithms into your ML pipeline. By the end of this book, you’ll be able to take advantage of the relationships in your dataset to improve your current model and make other types of elaborate predictions.
Table of Contents (16 chapters)
1
Part 1 – Creating Graph Data in Neo4j
4
Part 2 – Exploring and Characterizing Graph Data with Neo4j
8
Part 3 – Making Predictions on a Graph

Inserting data into Neo4j with Cypher, the Neo4j query language

Cypher, as we discussed at the beginning of this chapter, is the query language developed by Neo4j. It is used by other graph database vendors, such as Redis Graph.

First, let’s create some nodes in our newly created database.

To do so, open Neo4j Browser by clicking on the Open button next to your database and selecting Neo4j Browser:

Figure 1.12 – Start the Neo4j Browser application from Neo4j Desktop

Figure 1.12 – Start the Neo4j Browser application from Neo4j Desktop

From there, you can start and write Cypher queries in the upper text area.

Let’s start and create some nodes with the following Cypher query:

CREATE (:User {name: "Alice", birthPlace: "Paris"})
CREATE (:User {name: "Bob", birthPlace: "London"})
CREATE (:User {name: "Carol", birthPlace: "London"})
CREATE (:User {name: "Dave", birthPlace: "London"})
CREATE (:User {name: "Eve", birthPlace: "Rome"})

Before running the query, let me detail its syntax:

Figure 1.13 – Anatomy of a node creation Cypher statement

Figure 1.13 – Anatomy of a node creation Cypher statement

Note that all of these components except for the parentheses are optional. You can create a node with no label and no properties with CREATE (), even if creating an empty record wouldn’t be really useful for data storage purposes.

Tips

You can copy and paste the preceding query and execute it as-is; multiple line queries are allowed by default in Neo4j Browser.

If the upper text area is not large enough, press the Esc key to maximize it.

Now that we’ve created some nodes and since we are dealing with a graph database, it is time to learn how to connect these nodes by creating edges, or, in Neo4j language, relationships.

The following code snippet starts by fetching the start and end nodes (Alice and Bob), then creates a relationship between them. The created relationship is of the KNOWS type and carries one property (the date Alice and Bob met):

MATCH (alice:User {name: "Alice"})
MATCH (bob:User {name: "Bob"})
CREATE (alice)-[:KNOWS {since: "2022-12-01"}]->(bob)

We could have also put all our CREATE statements into one big query, for instance, by adding aliases to the created nodes:

CREATE (alice:User {name: "Alice", birthPlace: "Paris"})
CREATE (bob:User {name: "Bob", birthPlace: "London"})
CREATE (alice)-[:KNOWS {since: "2022-12-01"}]->(bob)

Note

In Neo4j, relationships are directed, meaning you have to specify a direction when creating them, which we can do thanks to the > symbol. However, Cypher lets you select data regardless of the relationship’s direction. We’ll discuss this when appropriate in the subsequent chapters.

Inserting data into the database is one thing, but without the ability to query and retrieve this data, databases would be useless. In the next section, we are going to use Cypher’s powerful pattern matching to read data from Neo4j.