HBase Administration Cookbook

HBase Administration Cookbook

By : Yifeng Jiang

Buy this Book

HBase Administration Cookbook

By: Yifeng Jiang

Buy this Book

Overview of this book

As an Open Source distributed big data store, HBase scales to billions of rows, with millions of columns and sits on top of the clusters of commodity machines. If you are looking for a way to store and access a huge amount of data in real-time, then look no further than HBase.HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.The recipes in this practical cookbook start from setting up a fully distributed HBase cluster and moving data into it. You will learn how to use all of the tools for day-to-day administration tasks as well as for efficiently managing and monitoring the cluster to achieve the best performance possible. Understanding the relationship between Hadoop and HBase will allow you to get the best out of HBase so the book will show you how to set up Hadoop clusters, configure Hadoop to cooperate with HBase, and tune its performance.

HBase Administration Cookbook

Credits

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Setting Up HBase Cluster

Introduction

Quick start

Getting ready on Amazon EC2

Setting up Hadoop

Setting up ZooKeeper

Changing the kernel settings

Setting up HBase

Basic Hadoop/ZooKeeper/HBase configurations

Setting up multiple High Availability (HA) masters

Data Migration

Introduction

Importing data from MySQL via single client

Importing data from TSV files using the bulk load tool

Writing your own MapReduce job to import data

Precreating regions before moving data into HBase

Using Administration Tools

Introduction

HBase Master web UI

Using HBase Shell to manage tables

Using HBase Shell to access data in HBase

Using HBase Shell to manage the cluster

Executing Java methods from HBase Shell

Row counter

WAL tool—manually splitting and dumping WALs

HFile tool—viewing textualized HFile content

HBase hbck—checking the consistency of an HBase cluster

Hive on HBase—querying HBase using a SQL-like language

Backing Up and Restoring HBase Data

Introduction

Full shutdown backup using distcp

Using CopyTable to copy data from one table to another

Exporting an HBase table to dump files on HDFS

Restoring HBase data by importing dump files from HDFS

Backing up NameNode metadata

Backing up region starting keys

Cluster replication

Monitoring and Diagnosis

Introduction

Showing the disk utilization of HBase tables

Setting up Ganglia to monitor an HBase cluster

OpenTSDB—using HBase to monitor an HBase cluster

Setting up Nagios to monitor HBase processes

Using Nagios to check Hadoop/HBase logs

Simple scripts to report the status of the cluster

Hot region—write diagnosis

Maintenance and Security

Introduction

Enabling HBase RPC DEBUG-level logging

Graceful node decommissioning

Adding nodes to the cluster

Rolling restart

Simple script for managing HBase processes

Simple script for making deployment easier

Kerberos authentication for Hadoop and HBase

Configuring HDFS security with Kerberos

HBase security configuration

Troubleshooting

Introduction

Troubleshooting tools

Handling the XceiverCount error

Handling the "too many open files" error

Handling the "unable to create new native thread" error

Handling the "HBase ignores HDFS client configuration" issue

Handling the ZooKeeper client connection error

Handling the ZooKeeper session expired error

Handling the HBase startup error on EC2

Basic Performance Tuning

Introduction

Setting up Hadoop to spread disk I/O

Using network topology script to make Hadoop rack-aware

Mounting disks with noatime and nodiratime

Setting vm.swappiness to 0 to avoid swap

Java GC and HBase heap settings

Using compression

Managing compactions

Managing a region split

Advanced Configurations and Tuning

Introduction

Benchmarking HBase cluster with YCSB

Increasing region server handler count

Precreating regions using your own algorithm

Avoiding update blocking on write-heavy clusters

Tuning memory size for MemStores

Client-side tuning for low latency systems

Configuring block cache for column families

Client side scanner setting

Tuning block size to improve seek performance

Enabling Bloom Filter to improve the overall throughput

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Setting up ZooKeeper

A distributed HBase depends on a running ZooKeeper cluster. All HBase cluster nodes and clients need to be able to access the ZooKeeper ensemble.

This recipe describes how to set up a ZooKeeper cluster. We will only set up a standalone ZooKeeper node for our HBase cluster, but in production it is recommended that you run a ZooKeeper ensemble of at least three nodes. Also, make sure to run an odd number of nodes.

We will cover the setting up of a clustered ZooKeeper in the There's more... section of this recipe.

Getting ready

First, make sure Java is installed in your ZooKeeper server.

We will use the hadoop user as the owner of all ZooKeeper daemons and files. All the ZooKeeper files and data will be stored under /usr/local/ZooKeeper; you need to create this directory in advance. Our ZooKeeper will be set up on master1 too.

We will set up one ZooKeeper client on client1. So, the Java installation, hadoop user, and directory should be prepared on client1 as well.

How to do it...

To set up a standalone ZooKeeper installation, follow these instructions:

1. Get the latest stable ZooKeeper release from ZooKeeper's official site, http://ZooKeeper.apache.org/releases.html#download.
2. Download the tarball and decompress it to our root directory for ZooKeeper. We will set a ZK_HOME environment variable to make the setup easier. As of this writing, ZooKeeper 3.4.3 is the latest stable version:
```
hadoop@master1$ ln -s ZooKeeper-3.4.3 current
hadoop@master1$ export ZK_HOME=/usr/local/ZooKeeper/current
```

3. Create directories for ZooKeeper to store its snapshot and transaction log:

hadoop@master1$ mkdir -p /usr/local/ZooKeeper/data
hadoop@master1$ mkdir -p /usr/local/ZooKeeper/datalog

4. Create the $ZK_HOME/conf/java.env file and put the Java settings there:

hadoop@master1$ vi $ZK_HOME/conf/java.env
JAVA_HOME=/usr/local/jdk1.6
export PATH=$JAVA_HOME/bin:$PATH

5. Copy the sample ZooKeeper setting file, and make the following changes to set where ZooKeeper should store its data:

hadoop@master1$ cp $ZK_HOME/conf/zoo_sample.cfg $ZK_HOME/conf/zoo.cfg
hadoop@master1$ vi $ZK_HOME/conf/zoo.cfg
dataDir=/usr/local/ZooKeeper/var/data
dataLogDir=/usr/local/ZooKeeper/var/datalog

6. Sync all files under /usr/local/ZooKeeper from the master node to the client. Don't sync ${dataDir} and ${dataLogDir} after this initial installation.
7. Start ZooKeeper from the master node by executing this command:
```
hadoop@master1$ $ZK_HOME/bin/zkServer.sh start
```

8. Connect to the running ZooKeeper, and execute some commands to verify the installation:

hadoop@client1$ $ZK_HOME/bin/zkCli.sh -server master1:2181
[zk: master1:2181(CONNECTED) 0] ls /
[ZooKeeper]
[zk: master1:2181(CONNECTED) 1] quit

9. Stop ZooKeeper from the master node by executing the following command:
```
hadoop@master1$ $ZK_HOME/bin/zkServer.sh stop
```

How it works...

In this recipe, we set up a basic standalone ZooKeeper instance. As you can see, the setting is very simple; all you need to do is to tell ZooKeeper where to find Java and where to save its data.

In step 4, we created a file named java.env and placed the Java settings in this file. You must use this filename as ZooKeeper, which by default, gets its Java settings from this file.

ZooKeeper's settings file is called zoo.cfg. You can copy the settings from the sample file shipped with ZooKeeper. The default setting is fine for basic installation. As ZooKeeper always acts as a central role in a cluster system, it should be set up properly to gain the best performance.

To connect to a running ZooKeeper ensemble, use its command-line tool, and specify the ZooKeeper server and port you want to connect to. The default client port is 2181. You don't need to specify it, if you are using the default port setting.

All ZooKeeper data is called a Znode. Znodes are constructed like a filesystem hierarchy. ZooKeeper provides commands to access or update Znode from its command-line tool; type help for more information.

There's more...

As HBase relays ZooKeeper as its coordination service, the ZooKeeper service must be extremely reliable. In production, you must run a ZooKeeper cluster of at least three nodes. Also, make sure to run an odd number of nodes.

The procedure to set up a clustered ZooKeeper is basically the same as shown in this recipe. You can follow the previous steps to set up each cluster node at first. Add the following settings to each node's zoo.cfg, so that every node knows about every other node in the ensemble:

hadoop@node{1,2,3}$ vi $ZK_HOME/conf/zoo.cfg

server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888

Also, you need to put a myid file under ${dataDir}. The myid file consists of a single line containing only the node ID. So myid of node1 would contain the text 1 and nothing else.

Note

Note that clocks on all ZooKeeper nodes must be synchronized. You can use Network Time Protocol (NTP) to have the clocks synchronized.

Start ZooKeeper from each node of your cluster respectively. Then, you can connect to the cluster from your client, by using the following command:

$ zkCli.sh -server node1,node2,node3

ZooKeeper will function as long as more than half of the nodes in the ZooKeeper cluster are alive. This means, in a three node cluster, only one server can die.

HBase Administration Cookbook

By : Yifeng Jiang

HBase Administration Cookbook

By: Yifeng Jiang

Overview of this book

Related Content you might be interested in

Current Title:

HBase Administration Cookbook

Setting up ZooKeeper

Getting ready

How to do it...

How it works...

There's more...

Note