Book Image

HBase Administration Cookbook

By : Yifeng Jiang
Book Image

HBase Administration Cookbook

By: Yifeng Jiang

Overview of this book

As an Open Source distributed big data store, HBase scales to billions of rows, with millions of columns and sits on top of the clusters of commodity machines. If you are looking for a way to store and access a huge amount of data in real-time, then look no further than HBase.HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.The recipes in this practical cookbook start from setting up a fully distributed HBase cluster and moving data into it. You will learn how to use all of the tools for day-to-day administration tasks as well as for efficiently managing and monitoring the cluster to achieve the best performance possible. Understanding the relationship between Hadoop and HBase will allow you to get the best out of HBase so the book will show you how to set up Hadoop clusters, configure Hadoop to cooperate with HBase, and tune its performance.
Table of Contents (16 chapters)
HBase Administration Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface

Introduction


There are several ways to move data into HBase:

  • Using the HBase Put API

  • Using the HBase bulk load tool

  • Using a customized MapReduce job

The HBase Put API is the most straightforward method. Its usage is not difficult to learn. For most situations however, it is not always the most efficient method. This is especially true when a large amount of data needs to be transferred into HBase within a limited time period. The volume of data to be taken care of is usually huge, and that's probably why you will be using HBase rather than another database. You have to think about how to move all that data into HBase carefully at the beginning of your HBase project; otherwise you might run into serious performance problems.

HBase has the bulk load feature to support loading of huge volumes of data efficiently into HBase. The bulk load feature uses a MapReduce job to load data into a specific HBase table by generating HBase's internal HFile data format files and then loading the data files...