Book Image

HBase Administration Cookbook

By : Yifeng Jiang
Book Image

HBase Administration Cookbook

By: Yifeng Jiang

Overview of this book

As an Open Source distributed big data store, HBase scales to billions of rows, with millions of columns and sits on top of the clusters of commodity machines. If you are looking for a way to store and access a huge amount of data in real-time, then look no further than HBase.HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.The recipes in this practical cookbook start from setting up a fully distributed HBase cluster and moving data into it. You will learn how to use all of the tools for day-to-day administration tasks as well as for efficiently managing and monitoring the cluster to achieve the best performance possible. Understanding the relationship between Hadoop and HBase will allow you to get the best out of HBase so the book will show you how to set up Hadoop clusters, configure Hadoop to cooperate with HBase, and tune its performance.
Table of Contents (16 chapters)
HBase Administration Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface

Quick start


HBase has two run modes—standalone mode and distributed mode. Standalone mode is the default mode of HBase. In standalone mode, HBase uses a local filesystem instead of HDFS, and runs all HBase daemons and an HBase-managed ZooKeeper instance, all in the same JVM.

This recipe describes the setup of a standalone HBase. It leads you through installing HBase, starting it in standalone mode, creating a table via HBase Shell, inserting rows, and then cleaning up and shutting down the standalone HBase instance.

Getting ready

You are going to need a Linux machine to run the stack. Running HBase on top of Windows is not recommended. We will use Debian 6.0.1 (Debian Squeeze) in this book, because we have several Hadoop/HBase clusters running on top of Debian in production at my company, Rakuten Inc., and 6.0.1 is the latest Amazon Machine Image (AMI) we have, at http://wiki.debian.org/Cloud/AmazonEC2Image.

As HBase is written in Java, you will need to have Java installed first. HBase runs on Oracle's JDK only, so do not use OpenJDK for the setup. Although Java 7 is available, we don't recommend you to use Java 7 now because it needs more time to be tested. You can download the latest Java SE 6 from the following link: http://www.oracle.com/technetwork/java/javase/downloads/index.html.

Execute the downloaded bin file to install Java SE 6. We will use /usr/local/jdk1.6 as JAVA_HOME in this book:

root# ln -s /your/java/install/directory /usr/local/jdk1.6

We will add a user with the name hadoop, as the owner of all HBase/Hadoop daemons and files. We will have all HBase files and data stored under /usr/local/hbase:

root# useradd hadoop
root# mkdir /usr/local/hbase
root# chown hadoop:hadoop /usr/local/hbase

How to do it...

Get the latest stable HBase release from HBase's official site, http://www.apache.org/dyn/closer.cgi/hbase/. At the time of writing this book, the current stable release was 0.92.1.

You can set up a standalone HBase instance by following these instructions:

  1. 1. Download the tarball and decompress it to our root directory for HBase. We will set an HBASE_HOME environment variable to make the setup easier, by using the following commands:

    root# su - hadoop
    hadoop$ cd /usr/local/hbase
    hadoop$ tar xfvz hbase-0.92.1.tar.gz
    hadoop$ ln -s hbase-0.92.1 current
    hadoop$ export HBASE_HOME=/usr/local/hbase/current
    
  2. 2. Set JAVA_HOME in HBase's environment setting file, by using the following command:

    hadoop$ vi $HBASE_HOME/conf/hbase-env.sh
    # The java implementation to use. Java 1.6 required.
    export JAVA_HOME=/usr/local/jdk1.6
    
  3. 3. Create a directory for HBase to store its data and set the path in the HBase configuration file (hbase-site.xml), between the<configuration> tag, by using the following commands:

    hadoop$ mkdir -p /usr/local/hbase/var/hbase
    hadoop$ vi /usr/local/hbase/current/conf/hbase-site.xml
    <property>
    <name>hbase.rootdir</name>
    <value>file:///usr/local/hbase/var/hbase</value>
    </property>
    
  4. 4. Start HBase in standalone mode by using the following command:

    hadoop$ $HBASE_HOME/bin/start-hbase.sh
    starting master, logging to /usr/local/hbase/current/logs/hbase-hadoop-master-master1.out
    
  5. 5. Connect to the running HBase via HBase Shell, using the following command:

    hadoop$ $HBASE_HOME/bin/hbase shell
    HBase Shell; enter 'help<RETURN>' for list of supported commands.
    Type "exit<RETURN>" to leave the HBase Shell
    Version 0.92.1, r1298924, Fri Mar 9 16:58:34 UTC 2012
    
  6. 6. Verify HBase's installation by creating a table and then inserting some values. Create a table named test, with a single column family named cf1, as shown here:

    hbase(main):001:0> create 'test', 'cf1'
    0 row(s) in 0.7600 seconds
    

    i. In order to list the newly created table, use the following command:

    hbase(main):002:0> list
    TABLE
    test
    1 row(s) in 0.0440 seconds
    

    ii. In order to insert some values into the newly created table, use the following commands:

    hbase(main):003:0> put 'test', 'row1', 'cf1:a', 'value1'
    0 row(s) in 0.0840 seconds
    hbase(main):004:0> put 'test', 'row1', 'cf1:b', 'value2'
    0 row(s) in 0.0320 seconds
    
  7. 7. Verify the data we inserted into HBase by using the scan command:

    hbase(main):003:0> scan 'test'
    ROW COLUMN+CELL row1 column=cf1:a, timestamp=1320947312117, value=value1 row1 column=cf1:b, timestamp=1320947363375, value=value2
    1 row(s) in 0.2530 seconds
    
  8. 8. Now clean up all that was done, by using the disable and drop commands:

    i. In order to disable the table test, use the following command:

    hbase(main):006:0> disable 'test'
    0 row(s) in 7.0770 seconds
    

    ii. In order to drop the the table test, use the following command:

    hbase(main):007:0> drop 'test'
    0 row(s) in 11.1290 seconds
    
  9. 9. Exit from HBase Shell using the following command:

    hbase(main):010:0> exit
    
  10. 10. Stop the HBase instance by executing the stop script:

hadoop$ /usr/local/hbase/current/bin/stop-hbase.sh
stopping hbase.......

How it works...

We installed HBase 0.92.1 on a single server. We have used a symbolic link named current for it, so that version upgrading in the future is easy to do.

In order to inform HBase where Java is installed, we will set JAVA_HOME in hbase-env.sh, which is the environment setting file of HBase. You will see some Java heap and HBase daemon settings in it too. We will discuss these settings in the last two chapters of this book.

In step 1, we created a directory on the local filesystem, for HBase to store its data. For a fully distributed installation, HBase needs to be configured to use HDFS, instead of a local filesystem. The HBase master daemon (HMaster) is started on the server where start-hbase.sh is executed. As we did not configure the region server here, HBase will start a single slave daemon (HRegionServer) on the same JVM too.

As we mentioned in the Introduction section, HBase depends on ZooKeeper as its coordination service. You may have noticed that we didn't start ZooKeeper in the previous steps. This is because HBase will start and manage its own ZooKeeper ensemble, by default.

Then we connected to HBase via HBase Shell. Using HBase Shell, you can manage your cluster, access data in HBase, and do many other jobs. Here, we just created a table called test, we inserted data into HBase, scanned the test table, and then disabled and dropped it, and exited the shell.

HBase can be stopped using its stop-hbase.sh script. This script stops both HMaster and HRegionServer daemons.