This is another chapter about performance tuning. In Chapter 8, Basic Performance Tuning, we described some recipes to tune Hadoop, OS settings, Java, and HBase itself to improve the overall performance of the HBase cluster. Those are general improvements for many use cases. In this chapter, we will describe more "specific" recipes; some of them are for write-heavy clusters, while some are aimed to improve read performance of the cluster.
Before tuning a HBase cluster, you will need to know how its performance is. Therefore, we will start by introducing how to use Yahoo! Cloud Serving Benchmark (YCSB) to measure (benchmark) performance of a HBase cluster.
In the recipe Precreating regions before moving data into HBase in Chapter 2, we introduced how to use HBase's RegionSplitter
utility to create a table with precreated regions to improve data loading speed. While RegionSplitter
by default precreate regions with MD5 number boundaries, for situations where row keys cannot be represented...