Book Image

HBase Administration Cookbook

By : Yifeng Jiang
Book Image

HBase Administration Cookbook

By: Yifeng Jiang

Overview of this book

As an Open Source distributed big data store, HBase scales to billions of rows, with millions of columns and sits on top of the clusters of commodity machines. If you are looking for a way to store and access a huge amount of data in real-time, then look no further than HBase.HBase Administration Cookbook provides practical examples and simple step-by-step instructions for you to administrate HBase with ease. The recipes cover a wide range of processes for managing a fully distributed, highly available HBase cluster on the cloud. Working with such a huge amount of data means that an organized and manageable process is key and this book will help you to achieve that.The recipes in this practical cookbook start from setting up a fully distributed HBase cluster and moving data into it. You will learn how to use all of the tools for day-to-day administration tasks as well as for efficiently managing and monitoring the cluster to achieve the best performance possible. Understanding the relationship between Hadoop and HBase will allow you to get the best out of HBase so the book will show you how to set up Hadoop clusters, configure Hadoop to cooperate with HBase, and tune its performance.
Table of Contents (16 chapters)
HBase Administration Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface

Changing the kernel settings


HBase is a database running on Hadoop, and just like other databases, it keeps a lot of files open at the same time. Linux limits the number of file descriptors that any one process may open; the default limits are 1024 per process. To run HBase smoothly, you need to increase the maximum number of open file descriptors for the user, who started HBase. In our case, the user is called hadoop.

You should also increase Hadoop's nproc setting. The nproc setting specifies the maximum number of processes that can exist simultaneously for the user. If nproc is too low, an OutOfMemoryError error may happen.

We will describe how to show and change the kernel settings, in this recipe.

Getting ready

Make sure you have root privileges on all of your servers.

How to do it...

You will need to make the following kernel setting changes to all servers of the cluster:

  1. 1. To confirm the current open file limits, log in as the hadoop user and execute the following command:

    hadoop$ ulimit -n
    1024
    
  2. 2. To show the setting for maximum processes, use the -u option of the ulimit command:

    hadoop$ ulimit -u
    unlimited
    
  3. 3. Log in as the root user to increase open file and nproc limits. Add the following settings to the limits.conf file:

    root# vi /etc/security/limits.conf
    hadoop soft nofile 65535
    hadoop hard nofile 65535
    hadoop soft nproc 32000
    hadoop hard nproc 32000
    
  4. 4. To apply the changes, add the following line into the /etc/pam.d/common-session file:

    root# echo "session required pam_limits.so" >> /etc/pam.d/common-session
    
  5. 5. Log out and back in again, as the hadoop user, and confirm the setting values again; you should see the above changes have been applied:

    hadoop$ ulimit -n
    65535
    hadoop$ ulimit -u
    32000
    

How it works...

The previous setting changes the hadoop user's open file limit to 65535. It also changes the hadoop user's max processes number to 32000. With this change of the kernel setting, HBase can keep enough files open at the same time and also run smoothly.

See also