In this section, the Cassandra NoSQL database will be used as a storage mechanism for Titan. Although it does not use Hadoop, it is a large-scale, cluster-based database in its own right, and can scale to very large cluster sizes. This section will follow the same process. As for HBase, a graph will be created, and stored in Cassandra using the Titan Gremlin shell. It will then be checked using Gremlin, and the stored data will be checked in Cassandra. The raw Titan Cassandra graph-based data will then be accessed from Spark. The first step then will be to install Cassandra on each node in the cluster.
Create a repo file that will allow the community version of DataStax Cassandra to be installed using the Linux yum
command. Root access will be required for this, so the su
command has been used to switch the user to the root. Install Cassandra on all the nodes:
[hadoop@hc2nn lib]$ su - [root@hc2nn ~]# vi /etc/yum.repos.d/datastax.repo [datastax] name...