Book Image

HBase High Performance Cookbook

By : Ruchir Choudhry
Book Image

HBase High Performance Cookbook

By: Ruchir Choudhry

Overview of this book

Apache HBase is a non-relational NoSQL database management system that runs on top of HDFS. It is an open source, disturbed, versioned, column-oriented store and is written in Java to provide random real-time access to big Data. We’ll start off by ensuring you have a solid understanding the basics of HBase, followed by giving you a thorough explanation of architecting a HBase cluster as per our project specifications. Next, we will explore the scalable structure of tables and we will be able to communicate with the HBase client. After this, we’ll show you the intricacies of MapReduce and the art of performance tuning with HBase. Following this, we’ll explain the concepts pertaining to scaling with HBase. Finally, you will get an understanding of how to integrate HBase with other tools such as ElasticSearch. By the end of this book, you will have learned enough to exploit HBase for boost system performance.
Table of Contents (19 chapters)
HBase High Performance Cookbook
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
7
Large-Scale MapReduce
Index

Replication


As the system grows and becomes more distributed, the need for data replication grows rapidly. It works on the core principle of moving a transactional data from one cluster to another. Usually, the master initiates the push to the slave. These transactions are usually done in an asynchronous manner. This is done to minimize the overhead on the master system. Usually, these transactions are done in a batch mode, and the size of the data packets can be controlled by the configuration size.

The benefits of HBase replication are as follows:

  • Data aggregation

  • Online data ingestion combined with offline data analysis

  • Geographic data distribution across multiple data centres

  • Backup and disaster recovery

How to do it…

  1. Let's edit hbase-site.xml:

    ${HBASE_HOME}/conf/hbase-site.xml on both clusters and add the following:
    <property>
    <name>hbase.replication</name>
    <value>true</value>
    </property>
  2. Copy or SCP hbase-site.xml to all nodes.

  3. Recylce the HBase cluster if...