HBase Administration Cookbook

HBase Administration Cookbook

By : Yifeng Jiang

Buy this Book

HBase Administration Cookbook

By: Yifeng Jiang

Buy this Book

Overview of this book

As an Open Source distributed big data store, HBase scales to billions of rows, with millions of columns and sits on top of the clusters of commodity machines. If you are looking for a way to store and access a huge amount of data in real-time, then look no further than HBase.HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.The recipes in this practical cookbook start from setting up a fully distributed HBase cluster and moving data into it. You will learn how to use all of the tools for day-to-day administration tasks as well as for efficiently managing and monitoring the cluster to achieve the best performance possible. Understanding the relationship between Hadoop and HBase will allow you to get the best out of HBase so the book will show you how to set up Hadoop clusters, configure Hadoop to cooperate with HBase, and tune its performance.

HBase Administration Cookbook

Credits

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Setting Up HBase Cluster

Introduction

Quick start

Getting ready on Amazon EC2

Setting up Hadoop

Setting up ZooKeeper

Changing the kernel settings

Setting up HBase

Basic Hadoop/ZooKeeper/HBase configurations

Setting up multiple High Availability (HA) masters

Data Migration

Introduction

Importing data from MySQL via single client

Importing data from TSV files using the bulk load tool

Writing your own MapReduce job to import data

Precreating regions before moving data into HBase

Using Administration Tools

Introduction

HBase Master web UI

Using HBase Shell to manage tables

Using HBase Shell to access data in HBase

Using HBase Shell to manage the cluster

Executing Java methods from HBase Shell

Row counter

WAL tool—manually splitting and dumping WALs

HFile tool—viewing textualized HFile content

HBase hbck—checking the consistency of an HBase cluster

Hive on HBase—querying HBase using a SQL-like language

Backing Up and Restoring HBase Data

Introduction

Full shutdown backup using distcp

Using CopyTable to copy data from one table to another

Exporting an HBase table to dump files on HDFS

Restoring HBase data by importing dump files from HDFS

Backing up NameNode metadata

Backing up region starting keys

Cluster replication

Monitoring and Diagnosis

Introduction

Showing the disk utilization of HBase tables

Setting up Ganglia to monitor an HBase cluster

OpenTSDB—using HBase to monitor an HBase cluster

Setting up Nagios to monitor HBase processes

Using Nagios to check Hadoop/HBase logs

Simple scripts to report the status of the cluster

Hot region—write diagnosis

Maintenance and Security

Introduction

Enabling HBase RPC DEBUG-level logging

Graceful node decommissioning

Adding nodes to the cluster

Rolling restart

Simple script for managing HBase processes

Simple script for making deployment easier

Kerberos authentication for Hadoop and HBase

Configuring HDFS security with Kerberos

HBase security configuration

Troubleshooting

Introduction

Troubleshooting tools

Handling the XceiverCount error

Handling the "too many open files" error

Handling the "unable to create new native thread" error

Handling the "HBase ignores HDFS client configuration" issue

Handling the ZooKeeper client connection error

Handling the ZooKeeper session expired error

Handling the HBase startup error on EC2

Basic Performance Tuning

Introduction

Setting up Hadoop to spread disk I/O

Using network topology script to make Hadoop rack-aware

Mounting disks with noatime and nodiratime

Setting vm.swappiness to 0 to avoid swap

Java GC and HBase heap settings

Using compression

Managing compactions

Managing a region split

Advanced Configurations and Tuning

Introduction

Benchmarking HBase cluster with YCSB

Increasing region server handler count

Precreating regions using your own algorithm

Avoiding update blocking on write-heavy clusters

Tuning memory size for MemStores

Client-side tuning for low latency systems

Configuring block cache for column families

Client side scanner setting

Tuning block size to improve seek performance

Enabling Bloom Filter to improve the overall throughput

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Setting up multiple High Availability (HA) masters

Hadoop and HBase are designed to handle the failover of their slave nodes automatically. Because there may be many nodes in a large cluster, a hardware failure of a server or shut down of a slave node are considered as normal in the cluster.

For the master nodes, HBase itself has no SPOF. HBase uses ZooKeeper as its central coordination service. A ZooKeeper ensemble is typically clustered with three or more servers; as long as more than half of the servers in the cluster are online, ZooKeeper can provide its service normally.

HBase saves its active master node, root region server location, and other important running data in ZooKeeper. Therefore, we can just start two or more HMaster daemons on separate servers and the one started first will be the active master server of the HBase cluster.

But, NameNode of HDFS is the SPOF of the cluster. NameNode keeps the entire HDFS's filesystem image in its local memory. HDFS cannot function anymore if NameNode is down, as HBase is down too. As you may notice, there is a Secondary NameNode of HDFS. Note that Secondary NameNode is not a standby of NameNode, it just provides a checkpoint function to NameNode. So, the challenge of a highly available cluster is to make NameNode highly available.

In this recipe, we will describe the setup of two highly available master nodes, which will use Heartbeat to monitor each other. Heartbeat is a widely used HA solution to provide communication and membership for a Linux cluster. Heartbeat needs to be combined with a Cluster Resource Manager (CRM) to start/stop services for that cluster. Pacemaker is the preferred cluster resource manager for Heartbeat. We will set up a Virtual IP (VIP) address using Heartbeat and Pacemaker, and then associate it with the active master node. Because EC2 does not support static IP addresses, we cannot demonstrate it on EC2, but we will discuss an alternative way of using Elastic IP (EIP) to achieve our purpose.

We will focus on setting up NameNode and HBase; you can simply use a similar method to set up two JobTracker nodes as well.

Getting ready

You should already have HDFS and HBase installed. We will set up a standby master node (master2), as you need another server ready to use. Make sure all the dependencies have been configured properly. Sync your Hadoop and HBase root directory from the active master (master1) to the standby master.

We will need NFS in this recipe as well. Set up your NFS server, and mount the same NFS directory from both master1 and master2. Make sure the hadoop user has write permission to the NFS directory. Create a directory on NFS to store Hadoop's metadata. We assume the directory is /mnt/nfs/hadoop/dfs/name.

We will set up VIP for the two masters, and assume you have the following IP addresses and DNS mapping:

master1: This has its IP address as 10.174.14.11.
master2: This has its IP address as 10.174.14.12.
master: This has its IP address as 10.174.14.10. It is the VIP that will be set up later.

How to do it...

The following instructions describe how to set up two highly available master nodes.

Install and configure Heartbeat and Pacemaker

First, we will install Heartbeat and Pacemaker, and make some basic configurations:

1. Install Heartbeat and Pacemaker on master1 and master2:

root# apt-get install heartbeat cluster-glue cluster-agents pacemaker

2. To configure Heartbeat, make the following changes to both master1 and master2:

root# vi /etc/ha.d/ha.cf
# enable pacemaker, without stonith
crm yes
# log where ?
logfacility local0
# warning of soon be dead
warntime 10
# declare a host (the other node) dead after:
deadtime 20
# dead time on boot (could take some time until net is up)
initdead 120
# time between heartbeats
keepalive 2
# the nodes
node master1
node master2
# heartbeats, over dedicated replication interface!
ucast eth0 master1 # ignored by master1 (owner of ip)
ucast eth0 master2 # ignored by master2 (owner of ip)
# ping the name server to assure we are online
ping ns

3. Create an authkeys file. Execute the following script as a root user on master1 and master2:

root# ( echo -ne "auth 1\n1 sha1 "; \
dd if=/dev/urandom bs=512 count=1 | openssl md5 ) \
> /etc/ha.d/authkeys
root# chmod 0600 /etc/ha.d/authkeys

Create and install a NameNode resource agent

Pacemaker depends on a resource agent to manager the cluster. A resource agent is an executable that manages a cluster resource. In our case, the VIP address and the HDFS NameNode service is the cluster resource we want to manage, using Pacemaker. Pacemaker ships with an IPaddr resource agent to manage VIP, so we only need to create our own namenode resource agent:

1. Add environment variables to the .bashrc file of the root user on master1 and master2. Don't forget to apply the changes:
```
root# vi /root/.bashrc
export JAVA_HOME=/usr/local/jdk1.6
export HADOOP_HOME=/usr/local/hadoop/current
export OCF_ROOT=/usr/lib/ocf
```
Invoke the following command to apply the previous changes:
```
root# source /root/.bashrc
```
2. Create a standard Open Clustering Framework (OCF) resource agent file called namenode, with the following content.
The namenode resource agent starts with including standard OCF functions such as the following:
```
root# vi namenode
#!/bin/sh
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
usage() {
echo "Usage: $0 {start|stop|status|monitor|meta-data|validate-all}"
}
```

3. Add a meta_data() function as shown in the following code. The meta_data() function dumps the resource agent metadata to standard output. Every resource agent must have a set of XML metadata describing its own purpose and supported parameters:

root# vi namenode
meta_data() {cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="namenode">
<version>0.1</version>
<longdesc lang="en">
This is a resource agent for NameNode. It manages HDFS namenode daemon.
</longdesc>
<shortdesc lang="en">Manage namenode daemon.</shortdesc>
<parameters></parameters>
<actions>
<action name="start" timeout="120" />
<action name="stop" timeout="120" />
<action name="status" depth="0" timeout="120" interval="120" />
<action name="monitor" depth="0" timeout="120" interval="120" />
<action name="meta-data" timeout="10" />
<action name="validate-all" timeout="5" />
</actions>
</resource-agent>
END
}

4. Add a namenode_start() function. This function is used by Pacemaker to actually start the NameNode daemon on the server. In the namenode_start() function, we firstly check whether NameNode is already started on the server; if it is not started, we invoke hadoop-daemon.sh from the hadoop user to start it:

root# vi namenode
namenode_start() {
# if namenode is already started on this server, bail out early
namenode_status
if [ $? -eq 0 ]; then
ocf_log info "namenode is already running on this server, skip"
return $OCF_SUCCESS
fi
# start namenode on this server
ocf_log info "Starting namenode daemon..."
su - hadoop -c "${HADOOP_HOME}/bin/hadoop-daemon.sh start name node"
if [ $? -ne 0 ]; then
ocf_log err "Can not start namenode daemon."
return $OCF_ERR_GENERIC;
fi
sleep 1
return $OCF_SUCCESS
}

5. Add a namenode_stop() function. This function is used by Pacemaker to actually stop the NameNode daemon on the server. In the namenode_stop() function, we first check whether NameNode is already stopped on the server; if it is running, we invoke hadoop-daemon.sh from the hadoop user to stop it:

root# vi namenode

namenode_stop () {
# if namenode is not started on this server, bail out early
namenode_status
if [ $? -ne 0 ]; then
ocf_log info "namenode is not running on this server, skip"
return $OCF_SUCCESS
fi
# stop namenode on this server
ocf_log info "Stopping namenode daemon..."
su - hadoop -c "${HADOOP_HOME}/bin/hadoop-daemon.sh stop name node"
if [ $? -ne 0 ]; then
ocf_log err "Can not stop namenode daemon."
return $OCF_ERR_GENERIC;
fi
sleep 1
return $OCF_SUCCESS
}

6. Add a namenode_status() function. This function is used by Pacemaker to monitor the status of the NameNode daemon on the server. In the namenode_status() function, we use the jps command to show all running Java processes owned by the hadoop user, and the grep name of the NameNode daemon to see whether it has started:

root# vi namenode
namenode_status () {
ocf_log info "monitor namenode"
su - hadoop -c "${JAVA_HOME}/bin/jps" | egrep -q "NameNode"
rc=$?
# grep will return true if namenode is running on this machine
if [ $rc -eq 0 ]; then
ocf_log info "Namenode is running"
return $OCF_SUCCESS else
ocf_log info "Namenode is not running" return $OCF_NOT_RUNNING
fi
}

7. Add a namenode_validateAll() function to make sure the environment variables are set properly before we run other functions:

root# vi namenode
namenode_validateAll () {
if [ -z "$JAVA_HOME" ]; then
ocf_log err "JAVA_HOME not set."
exit $OCF_ERR_INSTALLED
fi
if [ -z "$HADOOP_HOME" ]; then
ocf_log err "HADOOP_HOME not set."
exit $OCF_ERR_INSTALLED
fi
# Any subject is OK
return $OCF_SUCCESS
}

8. Add the following main routine. Here, we will simply call the previous functions to implement the required standard OCF resource agent actions:

root# vi namenode
# See how we were called.
if [ $# -ne 1 ]; then
usage
exit $OCF_ERR_GENERIC
fi
namenode_validateAll
case $1 in
meta-data) meta_data
exit $OCF_SUCCESS;;
usage) usage
exit $OCF_SUCCESS;;
*);;
esac
case $1 in
status|monitor) namenode_status;;
start) namenode_start;;
stop) namenode_stop;;
validate-all);;
*)usage
exit $OCF_ERR_UNIMPLEMENTED;;
esac
exit $?

9. Change the namenode file permission and test it on master1 and master2:

root# chmod 0755 namenode
root# ocf-tester -v -n namenode-test /full/path/of/namenode

10. Make sure all the tests are passed before proceeding to the next step, or the HA cluster will behave unexpectedly.

11. Install the namenode resource agent under the hac provider on master1 and master2:

root# mkdir ${OCF_ROOT}/resource.d/hac
root# cp namenode ${OCF_ROOT}/resource.d/hac
root# chmod 0755 ${OCF_ROOT}/resource.d/hac/namenode

Configure highly available NameNode

We are ready to configure highly available NameNode using Heartbeat and Pacemaker. We will set up a VIP address and configure Hadoop and HBase to use this VIP address as their master node. NameNode will be started on the active master where VIP is assigned. If active master has crashed, Heartbeat and Pacemaker will detect it and assign the VIP address to the standby master node, and then start NameNode there.

1. Start Heartbeat on master1 and master2:
```
root# /etc/init.d/heartbeat start
```
2. Change the default crm configuration. All resource-related commands are only executed once, from master1 or master2:
```
root# crm configure property stonith-enabled=false
root# crm configure property default-resource-stickiness=1
```

3. Add a VIP resource using our VIP address:

root# crm configure primitive VIP ocf:heartbeat:IPaddr params ip="10.174.14.10" op monitor interval="10s"

4. Make the following changes to configure Hadoop to use our VIP address. Sync to all masters, clients, and slaves after you've made the changes:
```
hadoop$ vi $HADOOP_HOME/conf/core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://master:8020</value>
</property>
```
5. Make the following changes to configure HBase to use our VIP address. Sync to all masters, clients, and slaves after you've made the changes:
```
hadoop$ vi $HBASE_HOME/conf/hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:8020/hbase</value>
</property>
```

6. To configure Hadoop to write its metadata to a local disk and NFS, make the following changes and sync to all masters, clients, and slaves:

hadoop$ vi $HADOOP_HOME/conf/hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/var/dfs/name,/mnt/nfs/hadoop /dfs/name</value>
</property>

7. Add the namenode resource agent we created in step 5 to Pacemaker. We will use NAMENODE as its resource name:

root# crm configure primitive NAMENODE ocf:hac:namenode op monitor interval="120s" timeout="120s" op start timeout="120s" op stop timeout="120s" meta resource-stickiness="1"

8. Configure the VIP resource and the NAMENODE resource as a resource group:
```
root# crm configure group VIP-AND-NAMENODE VIP NAMENODE
```

9. Configure colocation of a VIP resource and the NAMENODE resource:

root# crm configure colocation VIP-WITH-NAMENODE inf: VIP NAMENODE

10. Configure the resource order of the VIP resource and the NAMENODE resource:
```
root# crm configure order IP-BEFORE-NAMENODE inf: VIP NAMENODE
```

11. Verify the previous Heartbeat and resource configurations by using the crm_mon command. If everything is configured properly, you should see an output like the following :

root@master1 hac$ crm_mon -1r
============
Last updated: Tue Nov 22 22:39:11 2011
Stack: Heartbeat
Current DC: master2 (7fd92a93-e071-4fcb-993f-9a84e6c7846f) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 1 expected votes
1 Resources configured.
============
Online: [ master1 master2 ]
Full list of resources:
Resource Group: VIP-AND-NAMENODE
VIP (ocf::heartbeat:IPaddr): Started master1
NAMENODE (ocf::hac:namenode): Started master1

12. Make sure that the VIP and NAMENODE resources are started on the same server.
13. Now stop Heartbeat from master1; VIP-AND-NAMENODE should be started at master2 after several seconds.
14. Restart Heartbeat from master1; VIP-AND-NAMENODE should remain started at master2. Resources should NOT failback to master1.

Start DataNode, HBase cluster, and backup HBase master

We have confirmed that our HA configuration works as expected, so we can start HDFS and HBase now. Note that NameNode has already been started by Pacemaker, so we need only start DataNode here:

1. If everything works well, we can start DataNode now:

hadoop@master$ for i in 1 2 3
do
ssh slave$i "$HADOOP_HOME/bin/hadoop-daemon.sh start datanode"
sleep 1
done

2. Start your HBase cluster from master, which is the active master server where the VIP address is associated:
```
hadoop@master$ $HBASE_HOME/bin/start-hbase.sh
```
3. Start standby HMaster from the standby master server, master2 in this case:
```
hadoop@master2$ $HBASE_HOME/bin/hbase-daemon.sh start master
```

How it works...

The previous steps finally leave us with a cluster structure like the following diagram:

At first, we installed Heartbeat and Pacemaker on the two masters and then configured Heartbeat to enable Pacemaker.

In step 2 of the Create and install a NameNode resource agent section, we created the namenode script, which is implemented as a standard OCF resource agent. The most important function of the namenode script is namenode_status, which monitors the status of the NameNode daemon. Here we use the jps command to show all running Java processes owned by the hadoop user, and the grep name of the NameNode daemon to see if it has started. The namenode resource agent is used by Pacemaker to start/stop/monitor the NameNode daemon. In the namenode script, as you can see in the namenode_start and namenode_stop methods, we actually start/stop NameNode by using hadoop-daemon.sh, which is used to start/stop the Hadoop daemon on a single server. You can find a full list of the code from the source shipped with this book.

We started Heartbeat after our namenode resource agent was tested and installed. Then, we made some changes to the default crm configurations. The default-resource-stickiness=1 setting is very important as it turns off the automatic failback of a resource.

We added a VIP resource to Pacemaker and configured Hadoop and HBase to use it in steps 3 to 5 of the Configure highly available NameNode section. By using VIP in their configuration, Hadoop and HBase can switch to communicate with the standby master if the active one is down.

In step 6 of the same section, we configured Hadoop (HDFS NameNode) to write its metadata to both the local disk and NFS. If an active master is down, NameNode will be started from the standby master. Because they were mounted on the same NFS directory, NameNode started from the standby master can apply the latest metadata from NFS, and restore HDFS to the status before the original active master is down.

In steps 7 to 10, we added the NAMENODE resource using the namenode resource agent we created in step 2 of the Create and install a NameNode resource agent section, then we set up VIP and NAMENODE resources as a group (step 8), and made sure they always run on the same server (step 9), with the right start-up order (step 10). We did this because we didn't want VIP running on master1, while NameNode was running on master2.

Because Pacemaker will start NameNode for us via the namenode resource agent, we need to start DataNode separately, which is what we did in step 1 of the Start DataNode, HBase cluster, and backup HBase master section.

After starting HBase normally, we started our standby HBase master (HMaster) on the standby master server. If you check your HBase master log, you will find output like the following, which shows itself as a standby HMaster:

2011-11-21 23:38:55,168 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Another master is the active master, ip-10-174-14-15.us-west-1.compute.internal:60000; waiting to become the next active master

Finally, we got NameNode and HMaster running on two servers with an active-standby configuration. The single point of failure of the cluster was avoided.

However, it leaves us with lots of works to do in production. You need to test your HA cluster in all rare cases, such as a server power off, unplug of a network cable, shut down of network switch, or anything else you can think of.

On the other hand, SPOF of the cluster may not be as critical as you think. Based on our experience, almost all of the downtime of the cluster is due to an operational miss or software upgrade. It's better to make your cluster simple.

There's more...

It is more complex to set up a highly available HBase cluster on Amazon EC2 because EC2 does not support static IP addresses, and so we can't use VIP on EC2. An alternative way is to use an Elastic IP address. An Elastic IP address is the role of a static IP address on EC2 while it is associated with your account, not a particular instance. We can use Heartbeat to associate EIP to the standby master automatically, if the active one is down. Then, we configure Hadoop and HBase to use an instance's public DNS associated with EIP, to find an active master. Also, in the namenode resource agent, we have to start/stop not only NameNode, but also all DataNodes. This is because the IP address of an active master has changed, but DataNode cannot find the new active master unless it is restarted.

We will skip the details because it's out of the scope of this book. We created an elastic-ip resource agent to achieve this purpose. You can find it in the source shipped with this book.

HBase Administration Cookbook

By : Yifeng Jiang

HBase Administration Cookbook

By: Yifeng Jiang

Overview of this book

Related Content you might be interested in

Current Title:

HBase Administration Cookbook

Setting up multiple High Availability (HA) masters

Getting ready

How to do it...

Install and configure Heartbeat and Pacemaker

Create and install a NameNode resource agent

Configure highly available NameNode

Start DataNode, HBase cluster, and backup HBase master

How it works...

There's more...