Besides the tables in HBase, we should back up the region starting keys for each table. Region starting keys determine the data distribution in a table, as regions are split by region starting keys. A region is the basic unit for load balancing and metrics gathering in HBase.
There is no need to back up the region starting keys if you are performing full shutdown backups using distcp
, because distcp
also copies region boundaries to the backup cluster.
But for the live backup options, backing up region starting keys is as important as the table data, which is especially true if your data distribution is difficult to calculate in advance or your regions are manually split. It is important because live backup options, including the CopyTable
and Export
utilities use the normal HBase client API to restore data in a MapReduce job. The restoring speed can be improved dramatically if we precreate well-split regions before running the restore MapReduce job.
We will...