Book Image

Learning Cypher

By : Onofrio Panzarino
Book Image

Learning Cypher

By: Onofrio Panzarino

Overview of this book

Table of Contents (13 chapters)

Functions


Functions are grouped by usage. Except for the COALESCE function, all of these functions return a null value if any argument has a null value.

COALESCE

The COALESCE function takes any number of parameters (at least one) and returns the first non-null value. It returns a null value only in case all the parameters have null values.

Usually, this function is used every time you have to deal with null values and want a fallback value in case of a null value. Consider the following query:

MATCH (aa:Book) 
WHERE 'novel' IN COALESCE(aa.tags, ['novel']) 
RETURN aa.title

This query returns all books that have novel in their tags, but will return all the books that do not have no tags set as well. In fact, if a book doesn't have the tags property, the COALESCE function will return the collection ['novel']. Therefore, the IN predicate will always succeed.

TIMESTAMP

The TIMESTAMP function takes no arguments and returns the number of milliseconds passed from the midnight of January 1, 1970. This function is used when you create or update a node to mark the change in the date. It is especially useful to create versions. Consider the following query:

CREATE (a:Book {title: "Learning Cypher", version: TIMESTAMP() }) 

This query creates a book in the database setting the version number as a timestamp. Another query could check whether the version of the book has not changed before updating, and fail if a new version of the book is in the database (an optimistic lock).

ID

The ID function returns the identification number, managed by Neo4j, of a node or a relationship. Consider the following query:

MATCH (a) 
RETURN a 
ORDER BY ID(a) 
LIMIT 1

The preceding query returns the node with the minimum ID value. In fact, it sorts the dataset by the ID, then limits the results to one row.

Working with nodes

The following functions are specific to nodes. They are commonly used when you work with paths or collections of nodes and you want to inspect nodes and labels.

NODES

The NODES function returns a collection of all the nodes in a path. It is used when you want all the nodes found traversing a path between two nodes. Consider the following query:

MATCH p = (:Employee {surname: "Davies"})-[:REPORTS_TO*]-
          (:Employee {surname: "Taylor"}) 
RETURN NODES(p)

The preceding query first matches all the paths between two employees, visiting all the relations of the REPORTS_TO type and then returns all the nodes visited. In other words, it returns all the employees found in a hierarchy between two employees.

Note that, being a collection, the return value can be further inspected with the collection functions, which is explained later.

LABELS

To get all the labels assigned to a node, use the LABELS function. It returns a collection of strings. Consider the following query:

MATCH (e {surname: "Davies"}) 
RETURN LABELS(e)

This query returns all the labels assigned to the nodes that have the surname property set to Davies. This function is useful when your database has nodes that have multiple labels.

Working with paths and relationships

When you work with paths or with variable length relationships, you may need to inspect the relationships contained. The following functions are used for this purpose:

  • TYPE

  • ENDNODE and STARTNODE

  • SHORTESTPATH and ALLSHORTESTPATHS

  • RELATIONSHIPS

TYPE

The string returned by this function is the type of the relationship passed. Consider the following query:

MATCH ()-[r]-() 
RETURN TYPE(r), COUNT(*)

This query computes the number of relationships grouped by their type.

ENDNODE and STARTNODE

The following functions inspect the nodes at the endpoints of a relationship:

  • STARTNODE: This function returns the start node of the relationship passed as an argument

  • ENDNODE: This function returns the end node of the relationship passed as an argument

These functions are useful when you haven't specified the direction of the relationship in the MATCH clause of a query, but you want all the related nodes and wish to know the direction of the relationship.

For example, in the HR management tool, the following query can be used:

MATCH (e:Employee {surname: "Davies"})-[r:REPORTS_TO]-(a:Employee)
RETURN a, ENDNODE(r)

This query looks for all the neighborhood nodes of the node with the surname property Davies (depth of one) along the relationship of REPORTS_TO. It returns the neighborhood and the employee who is at the end node of the relationship. In other words, it says who the boss is.

SHORTESTPATH and ALLSHORTESTPATHS

The SHORTESTPATH and ALLSHORTESTPATHS functions search the shortest path among a set of paths. The difference is that while the SHORTESTPATH function returns at most one only path, the ALLSHORTESTPATHS function returns all the paths with the smallest depth in a collection of paths. Consider the following query:

MATCH p=SHORTESTPATH((a{surname:'Davies'})--(b{surname:'Doe'})) 
RETURN p

The preceding query returns the shortest path between two employee nodes (Davies and Doe). Now, consider the following query:

MATCH p=SHORTESTPATH( (a {surname:'Davies'})--(b) ) 
RETURN p
ORDER BY LENGTH(p) DESC
LIMIT 1

Instead, the preceding query returns the path to the farthest node from a given employee node (Davies). In fact, once stated that the distance from a node is the shortest path from it, this query first computes all distances from all nodes, then selects the longest distance.

RELATIONSHIPS

The RELATIONSHIPS function returns the collection of the relationships contained in a path. It is the equivalent of the NODES function. Consider the following query:

MATCH p = (a{surname:'Davies'})-[*..4]-(b{surname:'Taylor'})
RETURN RELATIONSHIPS(p)

This example takes all the paths with maximum depth of four between two nodes, then for each path, it returns the relationships visited.

Working with collections

The following functions are used to manipulate or inspect a collection:

  • HEAD

  • TAIL

  • LAST

HEAD, TAIL, and LAST

The HEAD, TAIL, and LAST functions are used to work with collections as if they were lists. The HEAD function returns the first item of the collection, the TAIL function returns the rest of the list, while the LAST function returns the last item of the collection. Note that when you add the head of a collection to its tail (HEAD(c) + TAIL(c)), you get the full collection again.

If the collection is empty, the HEAD and the LAST functions will return a null value, while the TAIL function will return an empty collection. Consider the following query:

MATCH (a:Book) 
RETURN HEAD(a.tags)
LIMIT 30

This query returns the first tag of the first 30 books found in the database.

LENGTH

The LENGTH function returns the size of a collection of any type or of a string. Consider the following query:

MATCH (a:Book) 
WHERE LENGTH(a.tags) >= 2 
RETURN a

This example query returns all the books that have at least two tags. Note that the books without the tags property will be ignored because LENGTH(NULL) is a null value. Now, consider the following query:

MATCH (a:Book) 
WHERE LENGTH(a.title) < 7 
RETURN a

This example, instead, shows the function in action with a string. It returns all the books with a title shorter than seven characters in length.

EXTRACT

The EXTRACT function is equivalent to the map function in the MapReduce paradigm. A similar function is included in most programming languages nowadays; for example, the map function in Scala, Java 8, JavaScript, and Ruby, the Select function in C#, and the list comprehensions in Python (although the map function is provided as well). It takes two parameters: a collection and an expression, which can be evaluated on each item of the collection. It returns a new collection, which has the values returned by the expression applied on every item of the collection. The syntax is similar to the syntax of the collection predicates (refer to Chapter 2, Filter, Aggregate, and Combine Results). Consider the following query:

RETURN EXTRACT(i IN [1,2,3] | i*2)

The result of this query is the collection [2,4,6] because it applies the expression i*2 on every item i of the collection [1,2,3]. Now, consider the following query:

MATCH (a:Book) 
RETURN EXTRACT(tag in a.tags | "Tag: " + tag) 
LIMIT 30

For every book in the first 30 books found in the database, this query returns a list of strings with the prefix Tag:.

This function is mostly used with collections of nodes or relationships. Consider the following query:

MATCH p=(e:Employee {surname: "Davies"})-[r:REPORTS_TO*]
            -(a:Employee)
RETURN EXTRACT(node IN NODES(p) | LABELS(node))

The preceding query, for each node found in a path, extracts the labels of that node.

FILTER

The FILTER function returns a collection created by adding values that satisfy a certain condition from a source collection. It is similar to the WHERE clause, but it works on collections. Consider the following query:

RETURN FILTER (x IN [1,2,3,4] WHERE x%2 = 0)

This simple query returns the even values from a list of integers. Yet, this query is mostly used while working with paths too. Now, consider the following query:

MATCH p=(e)-[r]->(a)
RETURN FILTER (node IN NODES(p) WHERE node:Book)

This query, for each path that matches the pattern, returns the list of nodes found in the path, but filters only the nodes with the label Book.

REDUCE

The REDUCE function has the same purpose as the Reduce function of the well-known MapReduce paradigm. Nowadays, almost all languages have this function, even if sometimes with different names; the reduce function in Java 8 (the Stream API) and JavaScript, the collect function in Ruby and Python, the Aggregate function in C#, and the foldLeft function in Scala. It is used to aggregate the items of a collection, scanning every item and accumulating them according to a given expression. For example, the following query returns the sum of all items of a collection:

RETURN REDUCE (total = 0, i in [1,2,3] | total + i)

The variable total is the accumulator initialized to 0. Then for each item in the collection, this variable is increased by the value of the item. Of course, you can apply the same pattern to strings. Consider the following query:

RETURN REDUCE (text = "", i in ["Hello ","world","!"] | text + i)

This query concatenates the strings in the collections and returns Hello world!. Let's see a more realistic usage. Consider the following query:

MATCH p=(e:Book)-[r*]-(a) 
WITH REDUCE (text = "", 
        node IN NODES(p) | 
           text + COALESCE(node.title, " * ") + "-") as TEXT
RETURN LEFT(text, LENGTH(text)-1)

This query, for each path found, returns a collection with the titles of the books found separated by a hyphen. In case a title is not found, thanks to the COALESCE function, an asterisk is returned instead of a null value. The last line of the query is needed to remove the last dash from the string.

RANGE

The RANGE function is very simple; it returns a collection of integers contained in a numerical range, which is computed with the help of a given step. Consider the following query:

RETURN RANGE (1, 10, 2)

The preceding query returns [1,3,5,7,9]. The third parameter in the function is the step parameter, and is optional. By default, it is 1. Now, consider the following query:

RETURN RANGE (1, 5)

The preceding query returns [1,2,3,4,5] as the ranges are inclusive. Note that if you try to pass the step parameter as 0, the query will fail with an error.

Working with strings

Cypher provides a set of functions to work with strings. They are the typical string functions provided by most SQL databases. They are as follows:

  • SUBSTRING, LEFT, and RIGHT

  • STR

  • REPLACE

  • Trimming functions

  • LOWER and UPPER

SUBSTRING, LEFT, and RIGHT

The SUBSTRING, LEFT, and RIGHT functions are used when you want the substring of a given string. The following are their uses:

  • SUBSTRING(source, index, length): This is a general function. It returns a string obtained by taking the characters of a given source string, from a given index, for a given length. If the length is not specified, it returns the string until the end.

  • LEFT(source, length): This function returns a string of a given length, taking the first characters of the source string.

  • RIGHT(source, length): This function returns a string of a given length, taking the last characters of the source string.

All of these functions will cause an exception if any index is negative, but won't cause any exception if the index is greater than the length of the string. In this case, they will just return an empty string.

The following query cuts the book where the titles are longer than four characters in length and append an ellipsis symbol at the end:

MATCH (a:Book) 
RETURN LEFT(a.title, 4) + "..."

STR

The STR function converts its unique argument to a string. It works on numbers, nodes, relationships, and paths. Consider the following query:

MATCH p=(a:Book{title:"Ficciones"})-[*]-() 
RETURN STR(p)

This query returns the string representation of the path found with the MATCH clause. The result obtained is follows:

[Node[1027]{title:"Ficciones"},:AuthorOf[277]{},Node[1026]{name:"Jorge Luis Borges"}]

Note

Conversion to integer values or to float values is not yet supported in Version 2.0.1, but two new functions (toInt and toFloat) were released along with the Version 2.0.2 to allow this conversion.

REPLACE

The REPLACE function returns a string obtained by replacing all occurrences of a given string inside another string with a specified string. Consider the following query:

RETURN REPLACE("Cypher 1.9", "1.9", "2.0")

The preceding query will replace the occurrences of 1.9 in Cypher 1.9 with 2.0, returning Cypher 2.0. The REPLACE function can be used to remove all occurrences of a string as well. Consider the following query:

MATCH (a:Book)
RETURN REPLACE(a.title, "The ", "")

This query will return all the book titles after removing the The string.

Trimming functions

Cypher provides three trimming functions. They are as follows:

  • LTRIM: This function is used to remove whitespace characters from a given string from the left

  • RTRIM: This function is used to trim a given string from the right

  • TRIM: This function is used to trim a given string from both the left and the right

These three functions accept one string parameter and return the trimmed value. Consider the following query:

MATCH (a:Book)
RETURN TRIM(a.title)

The rows returned by this query are the book titles in the database without whitespaces at the beginning or at the end of the string.

LOWER and UPPER

The LOWER function returns the lowercase version of a given string, while the UPPER function returns the uppercase version. They are often used to make a case-insensitive string comparison without using regular expressions. Consider the following query:

MATCH (a:Book) 
WHERE UPPER(a.title) = UPPER("Learning Cypher") 
RETURN a

The preceding query returns the book node that has the title Learning Cypher without caring for the case of the test string.

Aggregation functions

We learned how to aggregate datasets and explored some aggregation functions in detail in Chapter 2, Filter, Aggregate, and Combine Results. Here, we summarize the aggregation functions and their usage. You can also find an extensive guide on the usage of groups and functions in this chapter.

COUNT

The COUNT function is invoked with an argument, which can be an identifier or the * symbol. If this symbol is specified, the function will count all the rows in the dataset; otherwise, it will count the number of non-null values of the identifier. Consider the following query:

MATCH (aa: Person)
RETURN COUNT(a.surname)

This query returns the number of nodes with the label Person that have the property surname set. The COUNT function can be invoked with the DISTINCT clause to let it count only distinct values, as shown in the following query:

MATCH (a: Person)
RETURN COUNT(DISTINCT aa.surname)

SUM

To sum the numerical values in a dataset, you can use the SUM function. Consider the following query:

MATCH (b:Book) <-[r:Votes]- (:User) 
RETURN b, SUM(r.score)

This query returns the sum of the scores of the votes received by each book. The null values are ignored.

AVG

The AVG function computes the average of a group of number values in the dataset. Consider the following query:

MATCH (b:Book) <-[r:Votes]- (:User) 
RETURN b, AVG(r.score)

The rows returned by this query contain the average score of the votes received by a book. The null values are ignored in the computation of the average.

PERCENTILEDISC and PERCENTILECONT

The PERCENTILEDISC and PERCENTILECONT functions compute the percentile of a value in a dataset. The difference between them is in the formula used. The PERCENTILEDISC function uses a discrete model for interpolation, while the PERCENTILECONT function uses a continuous model. A percentile measures the value below which a given percentage of items fall. Consider the following query:

MATCH (b:Book) <-[r:Votes]- (:User) 
RETURN b, PERCENTILEDISC(r.score, 0.3)

This query returns, for each book, the thirtieth percentile of the score of votes for each book. It is a measure of the score below which 30 percent of the votes fall.

Note that there is no standard definition of percentile, but three possible definitions are available. An introduction to percentile definitions can be found at http://onlinestatbook.com/2/introduction/percentiles.html.

STDEV and STDEVP

Standard deviation gives an estimate of the dispersion from the average. You should use the STDEVP function when you are computing the standard deviation for the whole population; otherwise, the STDEV function must be used. Consider the following query:

MATCH (b:Book) <-[r:Votes]- (:User) 
RETURN STDEVP(r.score)

The return value of this query is the standard deviation of the score of the votes of all users.

A simple explanation of standard deviation can be found at http://www.mathplanet.com/education/algebra-2/quadratic-functions-and-inequalities/standard-deviation-and-normal-distribution.

MIN and MAX

The MIN and MAX functions return the minimum and the maximum of a value in the dataset. Consider the following query:

MATCH (b:Book) <-[r:Votes]- (:User) 
RETURN b, MIN(r.score), MAX(r.score)

The query returns all books voted by users, each with their minimum and maximum score received.

Mathematical functions

Cypher provides a number of mathematical functions. Explaining all of them in detail is beyond the scope of this book. The following is a list of the most commonly used functions:

Function

Return value

SIGN(x)

  • -1 if the given value is negative

  • 0 if the given value is zero

  • 1 if the given value is positive

ABS(x)

The absolute value of x

EXP(x)

The natural exponential function of x

LOG(x)

The natural logarithm of x

LOG10(x)

The common logarithm (base 10) of x

E()

The value of e (Euler's number), the base of natural logarithm

PI()

The value of pi

ROUND(x)

The value of x rounded to the nearest integer

RAND()

A random double value between zero and one

SQRT(x)

The square root of x

SIN(x)

The sine of x

COS(x)

The cosine of x

TAN(x)

The tangent of x