When we create a table in HBase, the table starts with a single region. All data inserted into that table goes to the single region. As data keeps growing, when the size of the region reaches a threshold, Region Splitting happens. The single region is split into two halves so that the table can handle more data.
In a write-heavy HBase cluster, this approach has several issues that need to be fixed:
The split/compaction storm issue.
As data grows uniformly, most of the regions are split at the same time, which causes huge disk I/O and network traffics.
Load is not well balanced until enough regions have been split.
Especially right after the table is created, all requests go to the same region server where the first region is deployed.
The split/compaction issue has been discussed in the Managing region split recipe in Chapter 8, Basic Performance Tuning. by using a manually splitting approach. For the second issue, we introduced how to avoid it...