A balanced Cassandra cluster is one where each node owns an equal number of keys. This means when you query nodetool ring
, a balanced cluster will show the same percentage for all the nodes under the Owns
or Effective Ownership
columns. If the data is not uniformly distributed between the keys, even with equal ownership you will see some nodes are more occupied by the data than others. We use RandomPartitioner
or Murmur3Partitioner
to avoid this sort of lopsided cluster.
Anytime a new node is added or a node is decommissioned, the token distribution gets skewed. Normally, one always wants to have Cassandra fairly load balanced to avoid hotspots. Fortunately, it is very easy to load balance. Here is the two-step load balancing process.
Calculate the initial tokens based on the partitioner that you are using. It can be manually generated by equally dividing the token range for a given partitioner among the number of nodes. Or, you can use
tools/bin/token-generator
to generate...