Book Image

Neo4j High Performance

By : Sonal Raj
Book Image

Neo4j High Performance

By: Sonal Raj

Overview of this book

Table of Contents (15 chapters)
Neo4j High Performance
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

The Neo4j setup and configurations


Neo4j is versatile in terms of usability. You can include and package Neo4j libraries in your application. This is referred to as the embedded mode of operation. For a server setup, you install Neo4j on the machine and configure it as an operating system service. The latest releases of Neo4j come with simple installer scripts for different operating systems. Let's take a look at how to configure Neo4j in the different modes of operation.

Modes of setup – the embedded mode

Neo4j in the embedded mode is used to include a graph database in your application. In this section, we will see how to configure Neo4j embedded into your application in Eclipse IDE. Ensure that you have the proper version of eclipse IDE from https://www.eclipse.org/downloads/ and the Neo4j Enterprise edition TAR archive from the other downloads section at http://www.neo4j.org/download.

Within Eclipse, navigate to File | New | Java Project, give your project a preferred name, and then click on Finish.

Under the Project Properties page, select the option for Java Build Path (1) on the sidebar, proceed to the Libraries tab (2), and then click on the button for Add External JARs (3). You can now locate the external JAR files of the libraries you want to add from here.

Navigate to the directory you extracted Neo4j under and look under the libs directory. Select all the *.jar files and click on Add. Click on Finish to complete the package addition process.

In the Eclipse navigation sidebar, right-click on the src folder of the newly created project and navigate to New | Package. In the dialog that appears, add a new package name. In the example, we have added com.neo4j.chapter1. Click on the Finish button.

Right-click on the package created and create a new Java class by navigating to New | Java Class and name it accordingly (use HelloNeo to run the following example). Click on Finish. Add the following code into your project. This is a sample program to test whether our embedded setup is working fine:

package com.neo4j.chapter1;

importorg.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Direction;
import org.neo4j.graphdb.Relationship;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;

public class HelloNeo {
   //change the path according to your system and OS
   private static final String PATH_TO_DB = "path_to_your neo4j_installation";

   String response;
   GraphDatabaseService graphDBase;
   Node node_one;
   Node node_two;
   Relationship relation;
   private static enum RelationTypes implements RelationshipType { HATES }

   public static void main( final String[] args )
   {
       HelloNeo neoObject = new HelloNeo();
       neoObject.createGraphDb();
       neoObject.removeGraph();
       neoObject.shutDownDbServer();   
   }

   void createGraphDb()
   {
       graphDBase = new GraphDatabaseFactory().newEmbeddedDatabase( PATH_TO_DB );

       Transaction tx = graphDBase.beginTx();
       try
       {
           node_one = graphDBase.createNode();
           node_one.setProperty( "name", "Bill Gates, Microsoft" );
           node_two = graphDBase.createNode();
           node_two.setProperty( "name", "Larry Page, Google" );

           relation = node_one.createRelationshipTo( node_two, RelationTypes.HATES );
           relation.setProperty( "relationship-type", "hates" );

           response = ( node_one.getProperty( "name" ).toString() )
                      + " " + ( relation.getProperty( "relationship-type" ).toString() )
                      + " " + ( node_two.getProperty( "name" ).toString() );
           System.out.println(response);

           tx.success();
       }
       finally
       {
           tx.finish();
       }
   }

   void removeGraph()
   {
       Transaction tx = graphDBase.beginTx();
       try
       {
           node_one.getSingleRelationship( RelationTypes.HATES, Direction.OUTGOING ).delete();
           System.out.println("Nodes are being removed . . .");
           node_one.delete();
           node_two.delete();
           tx.success();
       }
       finally
       {
           tx.finish();
       }
   }

   void shutDownDbServer()
   {
       graphDBase.shutdown();
       System.out.println("graphdb is shutting down."); 
   }   
}

On running the program, you will see the different stages of operation if your configuration is correct. In fact, there is an easier way to set up this configuration if you are familiar with Maven.

Note

Apache Maven is a software project management and comprehension tool. Based on the concept of Project Object Model (POM), Maven can manage a project's build, reporting, and documentation from a central piece of information. You can learn more about Maven from the official website at http://maven.apache.org/.

Start a new Maven project on Eclipse and edit pom.xml to have the following lines for the dependencies:

<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
    <groupId>org.neo4j</groupId>
    <artifactId>neo4j</artifactId>
    <version>2.0.1</version>
</dependency>
</dependencies>

When you save the pom.xml file, the Neo4j dependencies are installed into the project. You can now run the preceding script and test the configuration.

Modes of setup – the server mode

To develop applications on single machines locally, the embedded database is efficient and serves the purpose. Most of the examples in this book can be tested with the embedded setup. However, for larger applications that deal with rapidly scaling data, the server mode of Neo4j provides the necessary functionality.

Setting up a Neo4j server is relatively easy. You can include Neo4j startup and shutdown as a normal operating system process. For most Linux distributions, the following procedure would suffice:

  1. The latest release of Neo4j can be downloaded from http://www.neo4j.org/download. Select the compressed archive (tar.gz) distribution for your operating system.

  2. The archive contents can be extracted using tar -cf <filename>.

    The master directory housing Neo4j can be referred to as NEO4J_HOME.

  3. Move into the $NEO4J_HOME directory using cd $NEO4J_HOME and run the installer script using the following command:

    sudo ./bin/neo4j-installer install
    
  4. If prompted, you will be required to enter your user password for super-user access privileges to restricted directories:

    sudo service neo4j-service status
    

    This indicates the state of the server, which in this case is not running.

  5. The following command starts the Neo4j server:

    sudo service neo4j-service start
    
  6. If you need to stop the server, you can run this from the terminal:

    sudo service neo4j-service stop
    

During installation, you will be asked to select the user under which Neo4j will. You can specify a username (the default is neo4j), and if that user does not exist on that system, a system account in that name will be created and the ownership of the $NEO4J_HOME/data directory will be assigned (chown) to that user. It is a good practice to create a dedicated user to run this service, and hence it is suggested that the downloaded archive is extracted under /opt or the package directory for optional packages on your system.

If you want the Neo4j server to no longer be a part of the system startup service, the following commands can be used to remove it:

cd $NEO4J_HOME
sudo ./bin/neo4j-installer remove

If the server is running, it is stopped and removed.

Neo4j high availability

In this section, we will learn how to set up Neo4j HA onto a production cluster. Let's assume that our cluster has three machines to be set up with Neo4j HA.

Download Neo4j Enterprise from http://neo4j.org/download, extract the archive into the machines on the production cluster, and perform the following configurations to the local property files of the HA servers:

Machine #1 – neo4j-01.local

File: conf/neo4j.properties:

# A unique Id for this machine, must be non-negative
ha.server_id = 1

# Specify other hosts that make up this database cluster.
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001

# You can also specify the hosts using their IP addresses
# ha.initial_hosts = 192.168.0.61:5001, 192.168.0.62:5001, 192.168.0.63:5001

File: conf/neo4j-server.properties:

# Mention the IP address to which this database server will listen 
# to. 0.0.0.0 means it will listen to all incoming connections.
org.neo4j.server.webserver.address = 0.0.0.0

# Specify the mode of operation as HA if the mode is High 
# Availability or set to SINGLE if using a cluster of 1 Node
# (This is default setting)
org.neo4j.server.database.mode=HA

Machine #2 – neo4j-02.local

File: conf/neo4j.properties:

# A unique Id for this machine, must be non-negative
ha.server_id = 2

# Specify other hosts that make up this database cluster.
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001

# You can also specify the hosts using their IP addresses
#ha.initial_hosts = 192.168.0.61:5001, 192.168.0.62:5001, 192.168.0.63:5001

File: conf/neo4j-server.properties:

# Mention the IP address to which this database server will listen 
# to. 0.0.0.0 means it will listen to all incoming connections.
org.neo4j.server.webserver.address = 0.0.0.0

# Specify the mode of operation as HA if the mode is High 
# Availability or set to SINGLE if using a cluster of 1 Node
# (This is default setting)
org.neo4j.server.database.mode=HA

Machine #3 – neo4j-03.local

File: conf/neo4j.properties:

# A unique Id for this machine, must be non-negative
ha.server_id = 3

# Specify other hosts that make up this database cluster.
ha.initial_hosts = neo4j-01.local:5001, neo4j-02.local:5001, neo4j-03.local:5001

# You can also specify the hosts using their IP addresses
# ha.initial_hosts = 192.168.0.61:5001, 192.168.0.62:5001, 192.168.0.63:5001

File: conf/neo4j-server.properties:

# Mention the IP address to which this database server will listen 
# to. 0.0.0.0 means it will listen to all incoming connections.
org.neo4j.server.webserver.address = 0.0.0.0

# Specify the mode of operation as HA if the mode is High 
# Availability or set to SINGLE if using a cluster of 1 Node
# (This is default setting)
org.neo4j.server.database.mode = HA

Use the following commands on the neo4j script on each server to start up the servers. The order in which the servers are started is not important:

neo4j-01$ ./bin/neo4j start  (# to start first server)
neo4j-02$ ./bin/neo4j start  (# to start second server)
neo4j-03$ ./bin/neo4j start  (# to start third server)

If the database mode has been set to HA, the startup script does not wait for the server to become available, but returns immediately. The reason being that each machine does not accept requests till the setup of a cluster has been completed. For example, in the preceding configuration, this happens when the second machine starts up. In order to monitor the state of the startup process, you can trace messages in the console.log file created during the setup. You can find the location of the log file printed before the startup script terminates.