Book Image

Neo4j High Performance

By : Sonal Raj
Book Image

Neo4j High Performance

By: Sonal Raj

Overview of this book

Table of Contents (15 chapters)
Neo4j High Performance
About the Author
About the Reviewers

Types of NoSQL databases

At one time, the answer to all your database needs was a relational database. With the rapidly spreading NoSQL database craze, it is vital to realize that different use cases and functionality call for a different database type. Based on the purpose of use, NoSQL databases have been classified in the following areas:

Key-value stores

Key-value database management systems are the most basic and fundamental implementation of NoSQL types. Such databases operate similar to a dictionary by mapping keys to values and do not reflect structure or relation. Key-value databases are usually used for the rapid storage of information after performing some operation, for example, a resource (memory)-intensive computation. These data stores offer extremely high performance and are efficient and easily scalable. Some examples of key-value data stores are Redis (in-memory data store with optional persistence.), MemcacheDB (distributed, in-memory key-value store), and Riak (highly distributed, replicated key-value store). Sounds interesting, huh? But how do you decide when to use such data stores?

Let's take a look at some key-value data store use cases:

  • Cache Data: This is a type of rapid data storage for immediate or future use

  • Information Queuing: Some key-value stores such as Redis support queues, sets, and lists for queries and transactions

  • Keeping live information: Applications that require state management can use key-value stores for better performance

  • Distributing information or tasks

Column family stores

Column family NoSQL database systems extend the features of key-value stores to provide enhanced functionality. Although they are known to have a complex nature, column family stores operate by the simple creation of collections of key-value pairs (single or many) that match a record. Contrary to relational databases, column family NoSQL stores are schema-less. Each record has one or more columns that contain the information with variation in each column of each record.

Column-based NoSQL databases are basically 2D arrays where each key contains a single key-value pair or multiple key-value pairs associated with it, thereby providing support for large and unstructured datasets to be stored for future use. Such databases are generally used when the simple method of storing key-value pairs is not sufficient and storing large quantities of records with a lot of information is mandatory. Database systems that implement a column-based, schema-less model are extremely scalable.

These data stores are powerful and can be reliably used to store essential data of large sizes. Although they are not flexible in what constitutes the data (such as related objects cannot be stored!), they are extremely functional and performance oriented. Some column-based data stores are HBase (an Apache Hadoop data store based on ideas from BigTable) and Cassandra (a data store based on DynamoDB and BigTable).

So, when do we want to use such data stores? Let's take a look at some use cases to understand the utility of column-based data stores:

  • Scaling: Column family stores are highly scalable and can handle tons of information without affecting performance

  • Storing non-volatile, unstructured information: If collections of attributes or values need to persist for extended time periods, column-based data stores are quite handy

Document databases

Document-based NoSQL databases are the latest craze that have managed to gain wide and serious acceptance in large enterprises recently. These DBMS operate in a similar manner to column-based data stores, incorporating the fact that they allow much deeper nesting of data to realize more complex data structures (for example, a hierarchal data format with a document, within another document, within a document). Unlike columnar databases that allow one or two levels of nesting, document databases have no restriction on the key-value nesting in documents. Any document with a complex and arbitrary structure can be stored using such data stores.

Although they have a powerful nature of storage, where you can use the individual keys for the purpose of querying records, document-based database systems have their own issues and drawbacks, for example, getting the whole record to retrieve a value of the record and similarly for updates that affect the performance in the long run.

Document-based databases are a viable choice for storing a lot of unrelated complex information with variable structure. Some document-based databases are Couchbase (a memcached compatible and JSON-based document database), CouchDB, and MongoDB (a popular, efficient, and highly functional database that is gaining popularity in big data scenarios).

Let's look at popular use cases associated with document databases to decide when to pick them as your tools:

  • Nested information handling: These data stores are capable of handling data structures that are complex in nature and deeply nested

  • JavaScript compatible: They interface easily with applications that use JavaScript-friendly JSON in data handling

Graph databases

A graph database exposes a graph model that has create, read, update and delete (CRUD) operation support. Graph databases are online (real time) in nature and are built generally for the purpose of being used in transactional (OLTP) systems. A graph database model represents data in a completely different fashion, unlike the other NoSQL models. They are represented in the form of tree-like structures or graphs that have nodes and edges that are connected to each other by relationships. This model makes certain operations easier to perform since they link related pieces of information.

Such databases are popular in applications that establish a connection between entities. For example, when using online social or professional networks, your connection to your friends and their friends' friends' relation to you are simpler to deal with when using graph databases. Some popular graph databases are Neo4j (a schema-less, extremely powerful graph database built in Java) and OrientDB (a speed-oriented hybrid NoSQL database of graph and document types written in Java; it is equipped with a variety of operational modes). Let's look at the use cases of graph databases:

  • Modeling and classification handling: Graph databases are a perfect fit for situations involving related data. Data modeling and information classification based on related objects are efficient using this type of data store.

  • Complex relational information handling: Graph databases ease the use of connection between entities and support extremely complex related objects to be used in computation without much hassle.

    NoSQL database performance variation with size and complexity

The following criteria can help decide when the use of NoSQL databases is required depending on the situation in hand:

  • Data size matters: When large datasets are something you are working on and have to deal with scaling issues, then databases of the NoSQL family should be an ideal choice.

  • Factor of speed: Unlike relational databases, NoSQL data stores are considerably faster in terms of write operations. Reads, on the other hand, depend on the NoSQL database type being used and the type of data being stored and queried upon.

  • Schema-free design approach: Relational databases require you to define a structure at the time of creation. NoSQL solutions are highly flexible and permit you to define schemas on the fly with little or no adverse effects on performance.

  • Scaling with automated and simple replications: NoSQL databases are blending perfectly with distributed scenarios over time due to their built-in support. NoSQL solutions are easily scalable and work in clusters.

  • Variety of choices available: Depending on your type of data and intensity of use, you can choose from a wide range of available database solutions to viably use your database management systems.

Graph compute engines

A graph compute engine is a technology that enables global graph computational algorithms to be run against large datasets. The design of graph compute engines basically supports things such as identifying clusters in data, or applying computations on related data to answer questions such as how many relationships, on average, does everyone on Facebook have? Or who has second-degree connections with you on LinkedIn?

Because of their emphasis on global queries, graph compute engines are generally optimized to scan and process large amounts of information in batches, and in this respect, they are similar to other batch analysis technologies, such as data mining and OLAP, that are familiar in the relational world. Whereas some graph compute engines include a graph storage layer, others (and arguably most of them) concern themselves strictly with processing data that is fed in from an external source and returning the results.

A high-level overview of a graph computation engine setup