Cassandra High Availability

In a cluster of any significant size, nodes are bound to become unresponsive for a variety of reasons. Fortunately, Cassandra has a sophisticated mechanism called the failure detector that is designed to determine when this has occurred, then mark the node as down.

Most node failures result from temporary conditions, such as network issues. Therefore, Cassandra assumes the node will eventually come back online, and that permanent cluster changes will be executed explicitly using nodetool.

Marking a downed node

Each node keeps track of the state of other nodes in the cluster by means of an accrual failure detector (or phi failure detector). This detector evaluates the health of other nodes based on a sliding window of gossip message arrival times. It computes the statistical distribution of those arrival times per node, thus taking into account the current state of the network rather than using naïve thresholds or timeouts.

The ultimate result of the failure detection algorithm...

Cassandra High Availability

By : Robbie Strickland

Cassandra High Availability

By: Robbie Strickland

Overview of this book

Related Content you might be interested in

Current Title:

Cassandra High Availability

When a node goes down

Marking a downed node