Book Image

Building web applications with Python and Neo4j

By : Sumit Gupta
Book Image

Building web applications with Python and Neo4j

By: Sumit Gupta

Overview of this book

Table of Contents (14 chapters)
Building Web Applications with Python and Neo4j
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Thinking in graphs for SQL developers


Some might say that it is difficult for SQL developers to understand the paradigm of graphs, but it is not entirely true. The underlying essence of data modeling does not change. The focus is still on the entities and the relationship between these entities. Having said that, let's discuss the pros/cons, applicability, and similarity of the relational models and graph models.

The relational models are schema-oriented. If you know the structure of data in advance, it is easy to ensure that data conforms to it, and at the same time, it helps in enforcing stronger integrity. Some examples include traditional business applications, such as flight reservations, payroll, order processing, and many more.

The graph models are occurrence-oriented—Probabilistic model. They are adaptive and define a generic data structure that is evolving and works well with scenarios where the schema is not known in advance. The graph model is perfectly suited to store, manage, and extract highly-connected data.

Let's briefly discuss the disadvantages of the SQL databases, which led to the evolution of the graph databases:

  • It is difficult to develop efficient models for evolving data, such as social networks

  • The focus is more on the structure of data than the relationships

  • They lack an efficient mechanism for performing recursions

All the preceding reasons were sufficient to design a different data structure, and as a result, the graph data structures were introduced.

The objective of the graph databases was specifically to meet the disadvantages of the SQL databases. However, Neo4j as a graph database, also leveraged the advantages of the SQL databases wherever possible and applicable. Let's see a few of the similarities between the SQL and graph databases:

  • Highly Consistent: At any point in time, all nodes contain the same data at the same time

  • Transactional: All insert or update operations are within a transaction where they are ACID

Having said that, it is not wrong to say that the graph databases are more or less the next generation of relational databases.

Comparing SQL and Cypher

Every database has its own query languages; for example, RDBMS leverages SQL and conforms to SQL-92 (http://en.wikipedia.org/wiki/SQL-92). Similarly, Neo4j also has its own query language—Cypher. The syntax of Cypher has similarities with SQL, though it still has its own unique characteristics, which we will discuss in the upcoming sections.

Neo4j leveraged the concept of patterns and pattern matching, and introduced a new declarative graph query language, Cypher, for the Neo4j graph database. Patterns and pattern matching are the essence and core of Neo4j, so let's take a moment to understand them. We will then talk about the similarities between SQL and Cypher.

Patterns are a given sequence or occurrence of tokens in a particular format. The act of matching patterns within a given sequence of characters or any other compatible input form is known as pattern matching. Pattern matching should not be confused with pattern recognition, which usually produces the exact match and does not have any concept of partial matches.

Pattern matching is the heart of Cypher and a very important component of the graph databases. It helps in searching and identifying a single or a group of nodes by walking along the graph. Refer to http://en.wikipedia.org/wiki/Pattern_matching for more information on the importance of pattern matching in graphs. Let's move forward and talk about Cypher, and it's similarities with SQL.

Cypher is specifically designed to be a human query language, which is focused on making things simpler for developers. Cypher is a declarative language and implements "What to retrieve" and not "how to retrieve", which is in contrast to the other imperative languages, such as Java and Gremlin (refer to http://gremlin.tinkerpop.com/).

Cypher borrows much of its structure from SQL, which makes it easy to use/understand for SQL developers. "SQL familiarity" is another objective of Cypher.

Let's refer to the following illustration, which defines the Cypher constructs and the similarity of Cypher with SQL constructs:

The preceding diagram defines the mapping of the common SQL and Cypher constructs. It also depicts the examples stating the usage of these constructs.

For instance, FROM is similar to MATCH or START and produces the same results. Although the way they are used is different but the objective and concept remains the same.

We will talk about Cypher in detail in Chapter 2, Querying the Graph with Cypher and Chapter 3, Mutating Graph with Cypher, but without getting into the nitty-gritty and syntactical details. The following is one more illustration that briefly describes the similarities between the Cypher and SQL constructs:

In the preceding illustration, we are retrieving the data using Cypher pattern matching. In the statement shown in the preceding diagram, we are retrieving all the nodes that are labeled with FEMALE in our Neo4j database. This statement is very similar to the SQL statement where we want to retrieve some specific rows of a table based on a given criteria, such as the following query:

SELECT * from EMPLOYEE where GENDER = 'FEMALE'

The preceding examples should be sufficient to understand that SQL developers can learn Cypher in no time.

Let's take one more example where we want to retrieve the total number of employees in the company X:

  • SQL syntax: Select count (EMP-ID) from Employee where COMPANY_NAME='X'

  • Cypher syntax: match (n) where n.CompanyName='X' return count(n);

The preceding Cypher query shows the usage of aggregations such as count, which can also be replaced by sum, avg, min, max, and so on.

Note

Refer to http://neo4j.com/docs/stable/query-aggregation.html for further information on aggregations in Cypher.

Let's move forward and discuss the transformation of the SQL data structures into the graph data structures.

Evolving graph structures from SQL models

The relational models are the simplest models to depict and define the entities and the relationship between those entities. It is easy to understand and you can quickly whiteboard with your colleagues and domain experts.

A graph model is similar to a relational model as both models are focused on the domain and use case. However, there is a substantial difference in the way they are created and defined. We will discuss the way the graph models are derived from the relational models, but before that, let's look at the important components of the graph models:

  • Nodes: This component represents entities such as people, businesses, accounts, or any other item you might want to keep track of.

  • Labels: This component is the tag that defines the category of nodes. There can be one or more labels on a node. A label also helps in creating indexes, which further help in faster retrievals. We will discuss this in Chapter 3, Mutating Graph with Cypher.

  • Relationship: This component is the line that defines the connection between the two nodes. Relationship can further have its own properties and direction.

  • Properties: This component is pertinent information that relates to the nodes. This can be applied to a node or the relationship.

Let's take an example of a relational model, which is about an organization, and then understand the process of converting this into a graph model:

In the preceding relational model, we have employee, department, and title as entities, and Emp-Dept and Emp-Title as the relationship tables.

Here is sample data within this model:

The preceding screenshot depicts the sample data within the relational structures. The following are the guidelines to convert the preceding relational model into the graph model:

  • The entity table is represented by a label on nodes

  • Each row in an entity table is a node

  • The columns on these tables become the node properties

  • The foreign keys and the join tables are transformed into relationships; columns on these tables become the relationship properties

Now, let's follow the preceding guidelines and convert our relational model into the graph model, which will look something like the below image:

The preceding illustration defines the complete process and the organization of data residing in the relational models into the graph models. We can use the same guidelines for transforming a variety of relational models into the graph structures.

In this section, we discussed the similarities between SQL and Cypher. We also talked and discussed about the rules and processes of transforming the relational models into graph models. Let's move forward and understand the licensing and installation procedure of Neo4j.