One of the most important features of HBase is the use of data compression. It's important because:
Compression reduces the number of bytes written to/read from HDFS
Saves disk usage
Improves the efficiency of network bandwidth when getting data from a remote server
HBase supports the GZip and LZO codec. Our suggestion is to use the LZO compression algorithm because of its fast data decompression and low CPU usage. As a better compression ratio is preferred for the system, you should consider GZip.
Unfortunately, HBase cannot ship with LZO because of a license issue. HBase is Apache-licensed, whereas LZO is GPL-licensed. Therefore, we need to install LZO ourselves. We will use the hadoop-lzo library, which brings splittable LZO compression to Hadoop.
In this recipe, we will describe how to install LZO and how to configure HBase to use LZO compression.