Book Image

Apache Cassandra Essentials

By : Nitin Padalia
Book Image

Apache Cassandra Essentials

By: Nitin Padalia

Overview of this book

Apache Cassandra Essentials takes you step-by-step from from the basics of installation to advanced installation options and database design techniques. It gives you all the information you need to effectively design a well distributed and high performance database. You’ll get to know about the steps that are performed by a Cassandra node when you execute a read/write query, which is essential to properly maintain of a Cassandra cluster and to debug any issues. Next, you’ll discover how to integrate a Cassandra driver in your applications and perform read/write operations. Finally, you’ll learn about the various tools provided by Cassandra for serviceability aspects such as logging, metrics, backup, and recovery.
Table of Contents (14 chapters)
Apache Cassandra Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Configuration files


Now, let's look at some key configuration files and the options that we can configure in them:

cassandra.yaml

The configuration files are as follows:

  • Cluster configurations

    cluster_name: This is the identification string for a logical cluster. All nodes in a cluster must have the same value for this configuration.

    Default value: The default value is Test Cluster.

    listen_address: The Cassandra node will bind to this address. The other nodes in the cluster can communicate with this node if it is set correctly; leaving it to default will cause a failure in this node's communication with other nodes as default value is loopback address localhost hence node will not be able to communicate with other nodes running on different machines.

    Default value: The default value is localhost.

    seed_provider: The seed node helps Cassandra nodes to learn about other nodes in the cluster and ring topology using Gossip protocol. We'll learn more about Gossip protocol in later chapters. It has two suboptions, one is class_name and the other is number of seeds. The default seeding class takes a comma-delimited list of node addresses. In a multinode cluster, the seed list should have at least one node. This list should be common for all nodes.

    Default value: The default value is -class_name:org.apache.cassandra.locator.SimpleSeedProvider-seeds: "127.0.0.1".

    Tip

    The seed list should have more than one node for fault tolerance of the bootstrapping process.

    In a multi-data center cluster, at least one node from each data center should participate as a seed node.

    Note

    A node cannot be a seed node if it is a bootstrapping node. So, during the bootstrapping process, the node shouldn't be in the seeds list.

  • Data partitioning

    num_tokens: This configuration defines the number of random tokens this node will hold, hence defining the partitioning ranges that this node can hold. This is a relative configuration. For example, if a node has num_tokens as 128 while another node has 256, then it means that the second node is handling twice the data partition ranges than the first node is handling.

    Default value: The default value is 256.

    Tip

    All nodes with the same hardware capability should have the same number of tokens configured.

    partitioner: This defines the data partition algorithm used in the Cassandra cluster. The current default algorithm—Murmur3— is very fast and is considered as a good data partition algorithm as compared to its predecessors. So, while forming a new cluster, you should go with the default value, which is org.apache.cassandra.dht.Murmur3Partitioner.

    Note

    This setting shouldn't be changed once the data is loaded, as changing this will wipe all data directories, hence deleting data.

  • Storage configurations

    data_file_directories: Using this configuration option, we can set the data storage location.

    Default value: The default value is $CASSANDRA_HOME/data/data/var/lib/cassandra/data in older versions.

    commitlog_directory: This is the location in HDD where Cassandra will store commitlog.

    Default value: The default value is $CASSANDRA_HOME/data/commitlog /var/lib/cassandra/commitlog in older versions.

    Tip

    If using non-SSDs, you should have a separate disk for storing commitlog. Commit logs are append-only logs, however data files are random seeks in nature; so, using the same disk will affect the write performance of commit logs. Also, commit logs disks can be smaller in size. As the commitlog space is reusable once flushed to Disk from Memtable.

    saved_caches_directory: This is the location where cached rows, partition keys, or counters will be saved to disk after a certain duration of time.

    Default value: The default value is $CASSANDRA_HOME/data/saved_caches/var/lib/cassandra/saved_caches

    Note

    Row caching is disabled by default in cassandra.yaml due to its limited use.

  • Client configurations

    rpc_address: This is the thrift RPC service bind interface. You should set it appropriately; using the default won't allow connections from outside the node.

    Default value: The default value is localhost.

    rpc_port: This acts as a thrift service port.

    Default value: The default value is 9160

    native_transport_port: This is the port on which the CQL native transport will listen for clients; for example, cqlsh or Java Driver. This will use rpc_address as the connection interface.

    Default Value: The default value is 9042.

  • Security configurations

    authenticator: This configuration is used to specify whether you want to use a password-based authentication or none. For password-based authentication, authenticator should be set to PasswordAuthenticator. If PasswordAuthenticator is used, a username and hashed password are saved in the system_auth.credentials table.

    Default value: The default value is AllowAllAuthenticator, which means no authentication.

    authorizer: This configuration is used if you want to limit permissions to Cassandra objects, for example, tables. To enable authorization, set its value to CassandraAuthorizer. If enabled, it stores authorization information in the system_auth.pemissions table.

    Default value: The default value is AllowAllAuthorizer, which means authorization disabled.

    Tip

    If enabling authentication or authorization, increase system_auth keyspace's replication factor.

  • cassandra-env.sh

    This file can be used to fine-tune Cassandra. Here, you can set/tune a Java environement variable such as MAX_HEAP_SIZE, HEAP_NEWSIZE, and JAVA_OPTS.

  • cassandra-in.sh

    Here, you can alter the default values for environment variables such as JAVA_HOME, CASSANDRA_HOME and CLASSPATH. Its location is in $CASSANDRA_HOME/bin/ in binary tarball installations. Package-based installations put this file inside the /user/share/cassandra directory.

  • cassandra-rackdc.properties

    The rack and data center configurations for a node are defined here. The default datacenter is DC1 and the default rack is RAC1.

  • cassandra-topology.properties

    This file contains mapping of Cassandra node IPs to data center and racks.

  • logback.xml

    This file lets you configure the logging properties of Cassandra's system.log. It is not available in older versions of Cassandra.