Book Image

HBase Administration Cookbook

By : Yifeng Jiang
Book Image

HBase Administration Cookbook

By: Yifeng Jiang

Overview of this book

As an Open Source distributed big data store, HBase scales to billions of rows, with millions of columns and sits on top of the clusters of commodity machines. If you are looking for a way to store and access a huge amount of data in real-time, then look no further than HBase.HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.The recipes in this practical cookbook start from setting up a fully distributed HBase cluster and moving data into it. You will learn how to use all of the tools for day-to-day administration tasks as well as for efficiently managing and monitoring the cluster to achieve the best performance possible. Understanding the relationship between Hadoop and HBase will allow you to get the best out of HBase so the book will show you how to set up Hadoop clusters, configure Hadoop to cooperate with HBase, and tune its performance.
Table of Contents (16 chapters)
HBase Administration Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface

Setting up HBase


A fully distributed HBase instance has one or more master nodes (HMaster), and many slave nodes (RegionServer) running on HDFS. It uses a reliable ZooKeeper ensemble to coordinate all the components of the cluster, including masters, slaves, and clients.

It's not necessary to run HMaster on the same server of HDFS NameNode, but, for a small cluster, it's typical to have them run on the same server, just for ease of management. RegionServers are usually configured to run on servers of HDFS DataNode. Running RegionServer on the DataNode server has the advantage of data locality too. Eventually, DataNode running on the same server, will have a copy on it of all the data that RegionServer requires.

This recipe describes the setup of a fully distributed HBase. We will set up one HMaster on master1, and three region servers (slave1 to slave3). We will also set up an HBase client on client1.

Getting ready

First, make sure Java is installed on all servers of the cluster.

We will use the hadoop user as the owner of all HBase daemons and files, too. All HBase files and data will be stored under /usr/local/hbase. Create this directory on all servers of your HBase cluster, in advance.

We will set up one HBase client on client1. Therefore, the Java installation, hadoop user, and directory should be prepared on client1 too.

Make sure HDFS is running. You can ensure it started properly by accessing HDFS, using the following command:

hadoop@client1$ $HADOOP_HOME/bin/hadoop fs -ls /

MapReduce does not need to be started, as HBase does not normally use it.

We assume that you are managing your own ZooKeeper, in which case, you can start it and confirm if it is running properly. You can ensure it is running properly by sending the ruok command to its client port:

hadoop@client1$ echo ruok | nc master1 2181

How to do it...

To set up our fully distributed HBase cluster, we will download and configure HBase on the master node first, and then sync to all slave nodes and clients.

Get the latest stable HBase release from HBase's official site, http://www.apache.org/dyn/closer.cgi/hbase/.

At the time of writing this book, the current stable release was 0.92.1.

  1. 1. Download the tarball and decompress it to our root directory for HBase. Also, set an HBASE_HOME environment variable to make the setup easier:

    hadoop@master1$ ln -s hbase-0.92.1 current
    hadoop@master1$ export HBASE_HOME=/usr/local/hbase/current
    
  2. 2. We will use /usr/local/hbase/var as a temporary directory of HBase on the local filesystem. Remove it first if you have created it for your standalone HBase installation:

    hadoop@master1$ mkdir -p /usr/local/hbase/var
    
  3. 3. To tell HBase where the Java installation is, set JAVA_HOME in the HBase environment setting file (hbase-env.sh):

    hadoop@master1$ vi $HBASE_HOME/conf/hbase-env.sh
    # The java implementation to use. Java 1.6 required.
    export JAVA_HOME=/usr/local/jdk1.6
    
  4. 4. Set up HBase to use the independent ZooKeeper ensemble:

    hadoop@master1$ vi $HBASE_HOME/conf/hbase-env.sh
    # Tell HBase whether it should manage it's own instance of ZooKeeper or not.
    export HBASE_MANAGES_ZK=false
    
  5. 5. Add these settings to HBase's configuration file (hbase-site.xml):

    hadoop@master1$ vi $HBASE_HOME/conf/hbase-site.xml
    <configuration>
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://master1:8020/hbase</value>
    </property>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    <property>
    <name>hbase.tmp.dir</name>
    <value>/usr/local/hbase/var</value>
    </property>
    <property>
    <name>hbase.ZooKeeper.quorum</name>
    <value>master1</value>
    </property>
    </configuration>
    
  6. 6. Configure the slave nodes of the cluster:

    hadoop@master1$ vi $HBASE_HOME/conf/regionservers
    slave1
    slave2
    slave3
    
  7. 7. Link the HDFS configuration file (hdfs-site.xml) to HBase's configuration folder (conf), so that HBase can see the HDFS's client configuration on your Hadoop cluster:

    hadoop@master1$ ln -s $HADOOP_HOME/conf/hdfs-site.xml $HBASE_HOME/conf/hdfs-site.xml
    
  8. 8. Copy the hadoop-core and Zookeeper JAR file, and their dependencies, from your Hadoop and ZooKeeper installation:

    hadoop@master1$ rm -i $HBASE_HOME/lib/hadoop-core-*.jar
    hadoop@master1$ rm -i $HBASE_HOME/lib/ZooKeeper-*.jar
    hadoop@master1$ cp -i $HADOOP_HOME/hadoop-core-*.jar $HBASE_HOME/lib/
    hadoop@master1$ cp -i $HADOOP_HOME/lib/commons-configuration-1.6.jar $HBASE_HOME/lib/
    hadoop@master1$ cp -i $ZK_HOME/ZooKeeper-*.jar $HBASE_HOME/lib/
    
  9. 9. Sync all the HBase files under /usr/local/hbase from master, to the same directory as client and slave nodes.

  10. 10. Start the HBase cluster from the master node:

    hadoop@master1$ $HBASE_HOME/bin/start-hbase.sh
    
  11. 11. Connect to your HBase cluster from the client node:

    hadoop@client1$ $HBASE_HOME/bin/hbase shell
    
    • You can also access the HBase web UI from your browser. Make sure your master server's 60010 port is opened. The URL is http://master1:60010/master.jsp:

  12. 12. Stop the HBase cluster from the master node:

    hadoop@master1$ $HBASE_HOME/bin/stop-hbase.sh
    

How it works...

Our HBase cluster is configured to use /hbase as its root directory on HDFS, by specifying the hbase.rootdir property. Because it is the first time HBase was started, it will create the directory automatically. You can see the files HBase created on HDFS from the client:

hadoop@client1$ $HADOOP_HOME/bin/hadoop fs -ls /hbase

We want our HBase to run on distributed mode, so we set hbase.cluster.distributed to true in hbase-site.xml.

We also set up the cluster to use an independent ZooKeeper ensemble by specifying HBASE_MANAGES_ZK=false in hbase-env.sh. The ZooKeeper ensemble is specified by the hbase.ZooKeeper.quorum property. You can use clustered ZooKeeper by listing all the servers of the ensemble, such as zoo1,zoo2,zoo3.

All region servers are configured in the $HBASE_HOME/conf/regionservers file. You should use one line per region server. When starting the cluster, HBase will SSH into each region server configured here, and start the HRegionServer daemon on that server.

By linking hdfs-site.xml under the $HBASE_HOME/conf directory, HBase will use all the client configurations you made for your HDFS in hdfs-site.xml, such as the dfs.replication setting.

HBase ships with its prebuilt hadoop-core and ZooKeeper JAR files. They may be out of date, compared to what you used in your Hadoop and ZooKeeper installation. Make sure HBase uses the same version of .jar files with Hadoop and ZooKeeper, to avoid any unexpected problems.