Book Image

HBase Administration Cookbook

By : Yifeng Jiang
Book Image

HBase Administration Cookbook

By: Yifeng Jiang

Overview of this book

As an Open Source distributed big data store, HBase scales to billions of rows, with millions of columns and sits on top of the clusters of commodity machines. If you are looking for a way to store and access a huge amount of data in real-time, then look no further than HBase.HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.The recipes in this practical cookbook start from setting up a fully distributed HBase cluster and moving data into it. You will learn how to use all of the tools for day-to-day administration tasks as well as for efficiently managing and monitoring the cluster to achieve the best performance possible. Understanding the relationship between Hadoop and HBase will allow you to get the best out of HBase so the book will show you how to set up Hadoop clusters, configure Hadoop to cooperate with HBase, and tune its performance.
Table of Contents (16 chapters)
HBase Administration Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface

Setting up multiple High Availability (HA) masters


Hadoop and HBase are designed to handle the failover of their slave nodes automatically. Because there may be many nodes in a large cluster, a hardware failure of a server or shut down of a slave node are considered as normal in the cluster.

For the master nodes, HBase itself has no SPOF. HBase uses ZooKeeper as its central coordination service. A ZooKeeper ensemble is typically clustered with three or more servers; as long as more than half of the servers in the cluster are online, ZooKeeper can provide its service normally.

HBase saves its active master node, root region server location, and other important running data in ZooKeeper. Therefore, we can just start two or more HMaster daemons on separate servers and the one started first will be the active master server of the HBase cluster.

But, NameNode of HDFS is the SPOF of the cluster. NameNode keeps the entire HDFS's filesystem image in its local memory. HDFS cannot function anymore if NameNode is down, as HBase is down too. As you may notice, there is a Secondary NameNode of HDFS. Note that Secondary NameNode is not a standby of NameNode, it just provides a checkpoint function to NameNode. So, the challenge of a highly available cluster is to make NameNode highly available.

In this recipe, we will describe the setup of two highly available master nodes, which will use Heartbeat to monitor each other. Heartbeat is a widely used HA solution to provide communication and membership for a Linux cluster. Heartbeat needs to be combined with a Cluster Resource Manager (CRM)  to start/stop services for that cluster. Pacemaker is the preferred cluster resource manager for Heartbeat. We will set up a Virtual IP (VIP) address using Heartbeat and Pacemaker, and then associate it with the active master node. Because EC2 does not support static IP addresses, we cannot demonstrate it on EC2, but we will discuss an alternative way of using Elastic IP (EIP) to achieve our purpose.

We will focus on setting up NameNode and HBase; you can simply use a similar method to set up two JobTracker nodes as well.

Getting ready

You should already have HDFS and HBase installed. We will set up a standby master node (master2), as you need another server ready to use. Make sure all the dependencies have been configured properly. Sync your Hadoop and HBase root directory from the active master (master1) to the standby master.

We will need NFS in this recipe as well. Set up your NFS server, and mount the same NFS directory from both master1 and master2. Make sure the hadoop user has write permission to the NFS directory. Create a directory on NFS to store Hadoop's metadata. We assume the directory is /mnt/nfs/hadoop/dfs/name.

We will set up VIP for the two masters, and assume you have the following IP addresses and DNS mapping:

  • master1: This has its IP address as 10.174.14.11.

  • master2: This has its IP address as 10.174.14.12.

  • master: This has its IP address as 10.174.14.10. It is the VIP that will be set up later.

How to do it...

The following instructions describe how to set up two highly available master nodes.

Install and configure Heartbeat and Pacemaker

First, we will install Heartbeat and Pacemaker, and make some basic configurations:

  1. 1. Install Heartbeat and Pacemaker on master1 and master2:

    root# apt-get install heartbeat cluster-glue cluster-agents pacemaker
    
  2. 2. To configure Heartbeat, make the following changes to both master1 and master2:

    root# vi /etc/ha.d/ha.cf
    # enable pacemaker, without stonith
    crm yes
    # log where ?
    logfacility local0
    # warning of soon be dead
    warntime 10
    # declare a host (the other node) dead after:
    deadtime 20
    # dead time on boot (could take some time until net is up)
    initdead 120
    # time between heartbeats
    keepalive 2
    # the nodes
    node master1
    node master2
    # heartbeats, over dedicated replication interface!
    ucast eth0 master1 # ignored by master1 (owner of ip)
    ucast eth0 master2 # ignored by master2 (owner of ip)
    # ping the name server to assure we are online
    ping ns
    
  3. 3. Create an authkeys file. Execute the following script as a root user on master1 and master2:

    root# ( echo -ne "auth 1\n1 sha1 "; \
    dd if=/dev/urandom bs=512 count=1 | openssl md5 ) \
    > /etc/ha.d/authkeys
    root# chmod 0600 /etc/ha.d/authkeys
    

Create and install a NameNode resource agent

Pacemaker depends on a resource agent to manager the cluster. A resource agent is an executable that manages a cluster resource. In our case, the VIP address and the HDFS NameNode service is the cluster resource we want to manage, using Pacemaker. Pacemaker ships with an IPaddr resource agent to manage VIP, so we only need to create our own namenode resource agent:

  1. 1. Add environment variables to the .bashrc file of the root user on master1 and master2. Don't forget to apply the changes:

    root# vi /root/.bashrc
    export JAVA_HOME=/usr/local/jdk1.6
    export HADOOP_HOME=/usr/local/hadoop/current
    export OCF_ROOT=/usr/lib/ocf
    

    Invoke the following command to apply the previous changes:

    root# source /root/.bashrc
    
  2. 2. Create a standard Open Clustering Framework (OCF) resource agent file called namenode, with the following content.

    The namenode resource agent starts with including standard OCF functions such as the following:

    root# vi namenode
    #!/bin/sh
    : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
    . ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
    usage() {
    echo "Usage: $0 {start|stop|status|monitor|meta-data|validate-all}"
    }
    
  3. 3. Add a meta_data() function as shown in the following code. The meta_data() function dumps the resource agent metadata to standard output. Every resource agent must have a set of XML metadata describing its own purpose and supported parameters:

    root# vi namenode
    meta_data() {cat <<END
    <?xml version="1.0"?>
    <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
    <resource-agent name="namenode">
    <version>0.1</version>
    <longdesc lang="en">
    This is a resource agent for NameNode. It manages HDFS namenode daemon.
    </longdesc>
    <shortdesc lang="en">Manage namenode daemon.</shortdesc>
    <parameters></parameters>
    <actions>
    <action name="start" timeout="120" />
    <action name="stop" timeout="120" />
    <action name="status" depth="0" timeout="120" interval="120" />
    <action name="monitor" depth="0" timeout="120" interval="120" />
    <action name="meta-data" timeout="10" />
    <action name="validate-all" timeout="5" />
    </actions>
    </resource-agent>
    END
    }
    
  4. 4. Add a namenode_start() function. This function is used by Pacemaker to actually start the NameNode daemon on the server. In the namenode_start() function, we firstly check whether NameNode is already started on the server; if it is not started, we invoke hadoop-daemon.sh from the hadoop user to start it:

    root# vi namenode
    namenode_start() {
    # if namenode is already started on this server, bail out early
    namenode_status
    if [ $? -eq 0 ]; then
    ocf_log info "namenode is already running on this server, skip"
    return $OCF_SUCCESS
    fi
    # start namenode on this server
    ocf_log info "Starting namenode daemon..."
    su - hadoop -c "${HADOOP_HOME}/bin/hadoop-daemon.sh start name node"
    if [ $? -ne 0 ]; then
    ocf_log err "Can not start namenode daemon."
    return $OCF_ERR_GENERIC;
    fi
    sleep 1
    return $OCF_SUCCESS
    }
    
  5. 5. Add a namenode_stop() function. This function is used by Pacemaker to actually stop the NameNode daemon on the server. In the namenode_stop() function, we first check whether NameNode is already stopped on the server; if it is running, we invoke hadoop-daemon.sh from the hadoop user to stop it:

    root# vi namenode
    
    namenode_stop () {
    # if namenode is not started on this server, bail out early
    namenode_status
    if [ $? -ne 0 ]; then
    ocf_log info "namenode is not running on this server, skip"
    return $OCF_SUCCESS
    fi
    # stop namenode on this server
    ocf_log info "Stopping namenode daemon..."
    su - hadoop -c "${HADOOP_HOME}/bin/hadoop-daemon.sh stop name node"
    if [ $? -ne 0 ]; then
    ocf_log err "Can not stop namenode daemon."
    return $OCF_ERR_GENERIC;
    fi
    sleep 1
    return $OCF_SUCCESS
    }
    
  6. 6. Add a namenode_status() function. This function is used by Pacemaker to monitor the status of the NameNode daemon on the server. In the namenode_status() function, we use the jps command to show all running Java processes owned by the hadoop user, and the grep name of the NameNode daemon to see whether it has started:

    root# vi namenode
    namenode_status () {
    ocf_log info "monitor namenode"
    su - hadoop -c "${JAVA_HOME}/bin/jps" | egrep -q "NameNode"
    rc=$?
    # grep will return true if namenode is running on this machine
    if [ $rc -eq 0 ]; then
    ocf_log info "Namenode is running"
    return $OCF_SUCCESS else
    ocf_log info "Namenode is not running" return $OCF_NOT_RUNNING
    fi
    }
    
  7. 7. Add a namenode_validateAll() function to make sure the environment variables are set properly before we run other functions:

    root# vi namenode
    namenode_validateAll () {
    if [ -z "$JAVA_HOME" ]; then
    ocf_log err "JAVA_HOME not set."
    exit $OCF_ERR_INSTALLED
    fi
    if [ -z "$HADOOP_HOME" ]; then
    ocf_log err "HADOOP_HOME not set."
    exit $OCF_ERR_INSTALLED
    fi
    # Any subject is OK
    return $OCF_SUCCESS
    }
    
  8. 8. Add the following main routine. Here, we will simply call the previous functions to implement the required standard OCF resource agent actions:

    root# vi namenode
    # See how we were called.
    if [ $# -ne 1 ]; then
    usage
    exit $OCF_ERR_GENERIC
    fi
    namenode_validateAll
    case $1 in
    meta-data) meta_data
    exit $OCF_SUCCESS;;
    usage) usage
    exit $OCF_SUCCESS;;
    *);;
    esac
    case $1 in
    status|monitor) namenode_status;;
    start) namenode_start;;
    stop) namenode_stop;;
    validate-all);;
    *)usage
    exit $OCF_ERR_UNIMPLEMENTED;;
    esac
    exit $?
    
  9. 9. Change the namenode file permission and test it on master1 and master2:

    root# chmod 0755 namenode
    root# ocf-tester -v -n namenode-test /full/path/of/namenode
    
  10. 10. Make sure all the tests are passed before proceeding to the next step, or the HA cluster will behave unexpectedly.

  11. 11. Install the namenode resource agent under the hac provider on master1 and master2:

    root# mkdir ${OCF_ROOT}/resource.d/hac
    root# cp namenode ${OCF_ROOT}/resource.d/hac
    root# chmod 0755 ${OCF_ROOT}/resource.d/hac/namenode
    

Configure highly available NameNode

We are ready to configure highly available NameNode using Heartbeat and Pacemaker. We will set up a VIP address and configure Hadoop and HBase to use this VIP address as their master node. NameNode will be started on the active master where VIP is assigned. If active master has crashed, Heartbeat and Pacemaker will detect it and assign the VIP address to the standby master node, and then start NameNode there.

  1. 1. Start Heartbeat on master1 and master2:

    root# /etc/init.d/heartbeat start
    
  2. 2. Change the default crm configuration. All resource-related commands are only executed once, from master1 or master2:

    root# crm configure property stonith-enabled=false
    root# crm configure property default-resource-stickiness=1
    
  3. 3. Add a VIP resource using our VIP address:

    root# crm configure primitive VIP ocf:heartbeat:IPaddr params ip="10.174.14.10" op monitor interval="10s"
    
  4. 4. Make the following changes to configure Hadoop to use our VIP address. Sync to all masters, clients, and slaves after you've made the changes:

    hadoop$ vi $HADOOP_HOME/conf/core-site.xml
    <property>
    <name>fs.default.name</name>
    <value>hdfs://master:8020</value>
    </property>
    
  5. 5. Make the following changes to configure HBase to use our VIP address. Sync to all masters, clients, and slaves after you've made the changes:

    hadoop$ vi $HBASE_HOME/conf/hbase-site.xml
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://master:8020/hbase</value>
    </property>
    
  6. 6. To configure Hadoop to write its metadata to a local disk and NFS, make the following changes and sync to all masters, clients, and slaves:

    hadoop$ vi $HADOOP_HOME/conf/hdfs-site.xml
    <property>
    <name>dfs.name.dir</name>
    <value>/usr/local/hadoop/var/dfs/name,/mnt/nfs/hadoop /dfs/name</value>
    </property>
    
  7. 7. Add the namenode resource agent we created in step 5 to Pacemaker. We will use NAMENODE as its resource name:

    root# crm configure primitive NAMENODE ocf:hac:namenode op monitor interval="120s" timeout="120s" op start timeout="120s" op stop timeout="120s" meta resource-stickiness="1"
    
  8. 8. Configure the VIP resource and the NAMENODE resource as a resource group:

    root# crm configure group VIP-AND-NAMENODE VIP NAMENODE
    
  9. 9. Configure colocation of a VIP resource and the NAMENODE resource:

    root# crm configure colocation VIP-WITH-NAMENODE inf: VIP NAMENODE
    
  10. 10. Configure the resource order of the VIP resource and the NAMENODE resource:

    root# crm configure order IP-BEFORE-NAMENODE inf: VIP NAMENODE
    
  11. 11. Verify the previous Heartbeat and resource configurations by using the crm_mon command. If everything is configured properly, you should see an output like the following :

    root@master1 hac$ crm_mon -1r
    ============
    Last updated: Tue Nov 22 22:39:11 2011
    Stack: Heartbeat
    Current DC: master2 (7fd92a93-e071-4fcb-993f-9a84e6c7846f) - partition with quorum
    Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
    2 Nodes configured, 1 expected votes
    1 Resources configured.
    ============
    Online: [ master1 master2 ]
    Full list of resources:
    Resource Group: VIP-AND-NAMENODE
    VIP (ocf::heartbeat:IPaddr): Started master1
    NAMENODE (ocf::hac:namenode): Started master1
    
  12. 12. Make sure that the VIP and NAMENODE resources are started on the same server.

  13. 13. Now stop Heartbeat from master1; VIP-AND-NAMENODE should be started at master2 after several seconds.

  14. 14. Restart Heartbeat from master1; VIP-AND-NAMENODE should remain started at master2. Resources should NOT failback to master1.

Start DataNode, HBase cluster, and backup HBase master

We have confirmed that our HA configuration works as expected, so we can start HDFS and HBase now. Note that NameNode has already been started by Pacemaker, so we need only start DataNode here:

  1. 1. If everything works well, we can start DataNode now:

    hadoop@master$ for i in 1 2 3
    do
    ssh slave$i "$HADOOP_HOME/bin/hadoop-daemon.sh start datanode"
    sleep 1
    done
    
  2. 2. Start your HBase cluster from master, which is the active master server where the VIP address is associated:

    hadoop@master$ $HBASE_HOME/bin/start-hbase.sh
    
  3. 3. Start standby HMaster from the standby master server, master2 in this case:

    hadoop@master2$ $HBASE_HOME/bin/hbase-daemon.sh start master
    

How it works...

The previous steps finally leave us with a cluster structure like the following diagram:

At first, we installed Heartbeat and Pacemaker on the two masters and then configured Heartbeat to enable Pacemaker.

In step 2 of the Create and install a NameNode resource agent section, we created the namenode script, which is implemented as a standard OCF resource agent. The most important function of the namenode script is namenode_status, which monitors the status of the NameNode daemon. Here we use the jps command to show all running Java processes owned by the hadoop user, and the grep name of the NameNode daemon to see if it has started. The namenode resource agent is used by Pacemaker to start/stop/monitor the NameNode daemon. In the namenode script, as you can see in the namenode_start and namenode_stop methods, we actually start/stop NameNode by using hadoop-daemon.sh, which is used to start/stop the Hadoop daemon on a single server. You can find a full list of the code from the source shipped with this book.

We started Heartbeat after our namenode resource agent was tested and installed. Then, we made some changes to the default crm configurations. The default-resource-stickiness=1 setting is very important as it turns off the automatic failback of a resource.

We added a VIP resource to Pacemaker and configured Hadoop and HBase to use it in steps 3 to 5 of the Configure highly available NameNode section. By using VIP in their configuration, Hadoop and HBase can switch to communicate with the standby master if the active one is down.

In step 6 of the same section, we configured Hadoop (HDFS NameNode) to write its metadata to both the local disk and NFS. If an active master is down, NameNode will be started from the standby master. Because they were mounted on the same NFS directory, NameNode started from the standby master can apply the latest metadata from NFS, and restore HDFS to the status before the original active master is down.

In steps 7 to 10, we added the NAMENODE resource using the namenode resource agent we created in step 2 of the Create and install a NameNode resource agent section, then we set up VIP and NAMENODE resources as a group (step 8), and made sure they always run on the same server (step 9), with the right start-up order (step 10). We did this because we didn't want VIP running on master1, while NameNode was running on master2.

Because Pacemaker will start NameNode for us via the namenode resource agent, we need to start DataNode separately, which is what we did in step 1 of the Start DataNode, HBase cluster, and backup HBase master section.

After starting HBase normally, we started our standby HBase master (HMaster) on the standby master server. If you check your HBase master log, you will find output like the following, which shows itself as a standby HMaster:

2011-11-21 23:38:55,168 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Another master is the active master, ip-10-174-14-15.us-west-1.compute.internal:60000; waiting to become the next active master

Finally, we got NameNode and HMaster running on two servers with an active-standby configuration. The single point of failure of the cluster was avoided.

However, it leaves us with lots of works to do in production. You need to test your HA cluster in all rare cases, such as a server power off, unplug of a network cable, shut down of network switch, or anything else you can think of.

On the other hand, SPOF of the cluster may not be as critical as you think. Based on our experience, almost all of the downtime of the cluster is due to an operational miss or software upgrade. It's better to make your cluster simple.

There's more...

It is more complex to set up a highly available HBase cluster on Amazon EC2 because EC2 does not support static IP addresses, and so we can't use VIP on EC2. An alternative way is to use an Elastic IP address. An Elastic IP address is the role of a static IP address on EC2 while it is associated with your account, not a particular instance. We can use Heartbeat to associate EIP to the standby master automatically, if the active one is down. Then, we configure Hadoop and HBase to use an instance's public DNS associated with EIP, to find an active master. Also, in the namenode resource agent, we have to start/stop not only NameNode, but also all DataNodes. This is because the IP address of an active master has changed, but DataNode cannot find the new active master unless it is restarted.

We will skip the details because it's out of the scope of this book. We created an elastic-ip resource agent to achieve this purpose. You can find it in the source shipped with this book.