In this chapter, you have learned that HBase is a NoSQL, Column-oriented database with flexible schema. It has the following components – MasterServer, RegionServer, and Regions and utilizes Zookeeper to monitor them with two caches – WAL in RegionServers and MemStore in Regions. We also saw how HBase manages the data by performing RegionSplitting and Compaction. HBase provides partition tolerance and much higher consistency levels as compared to availability from the CAP theorem.
The HBase Data Model is different from the traditional RDBMS as data is stored in a column oriented database and in a multidimensional map of key-value pairs. Rows are identified by rowkey and are distributed across clusters using a range of values of rowkey. Rowkey is critical in designing schema for HBase for performance and data management.
In a Hadoop project, data management is a very critical step. In the context of Big Data, Hadoop has the benefit of the data management aspect. But managing it with...