A fully distributed HBase instance has one or more master nodes (HMaster), and many slave nodes (RegionServer) running on HDFS. It uses a reliable ZooKeeper ensemble to coordinate all the components of the cluster, including masters, slaves, and clients.
It's not necessary to run HMaster on the same server of HDFS NameNode, but, for a small cluster, it's typical to have them run on the same server, just for ease of management. RegionServers are usually configured to run on servers of HDFS DataNode. Running RegionServer on the DataNode server has the advantage of data locality too. Eventually, DataNode running on the same server, will have a copy on it of all the data that RegionServer requires.
This recipe describes the setup of a fully distributed HBase. We will set up one HMaster on master1
, and three region servers (slave1
to slave3)
. We will also set up an HBase client on client1
.
First, make sure Java is installed on all servers of the cluster.
We will use the hadoop
user as the owner of all HBase daemons and files, too. All HBase files and data will be stored under /usr/local/hbase
. Create this directory on all servers of your HBase cluster, in advance.
We will set up one HBase client on client1
. Therefore, the Java installation, hadoop
user, and directory should be prepared on client1
too.
Make sure HDFS is running. You can ensure it started properly by accessing HDFS, using the following command:
hadoop@client1$ $HADOOP_HOME/bin/hadoop fs -ls /
MapReduce does not need to be started, as HBase does not normally use it.
We assume that you are managing your own ZooKeeper, in which case, you can start it and confirm if it is running properly. You can ensure it is running properly by sending the ruok
command to its client port:
hadoop@client1$ echo ruok | nc master1 2181
To set up our fully distributed HBase cluster, we will download and configure HBase on the master node first, and then sync to all slave nodes and clients.
Get the latest stable HBase release from HBase's official site, http://www.apache.org/dyn/closer.cgi/hbase/.
At the time of writing this book, the current stable release was 0.92.1.
1. Download the tarball and decompress it to our
root
directory for HBase. Also, set anHBASE_HOME
environment variable to make the setup easier:hadoop@master1$ ln -s hbase-0.92.1 current hadoop@master1$ export HBASE_HOME=/usr/local/hbase/current
2. We will use
/usr/local/hbase/var
as a temporary directory of HBase on the local filesystem. Remove it first if you have created it for your standalone HBase installation:hadoop@master1$ mkdir -p /usr/local/hbase/var
3. To tell HBase where the Java installation is, set
JAVA_HOME
in the HBase environment setting file (hbase-env.sh):hadoop@master1$ vi $HBASE_HOME/conf/hbase-env.sh # The java implementation to use. Java 1.6 required. export JAVA_HOME=/usr/local/jdk1.6
4. Set up HBase to use the independent ZooKeeper ensemble:
hadoop@master1$ vi $HBASE_HOME/conf/hbase-env.sh # Tell HBase whether it should manage it's own instance of ZooKeeper or not. export HBASE_MANAGES_ZK=false
5. Add these settings to HBase's configuration file (
hbase-site.xml
):hadoop@master1$ vi $HBASE_HOME/conf/hbase-site.xml <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://master1:8020/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.tmp.dir</name> <value>/usr/local/hbase/var</value> </property> <property> <name>hbase.ZooKeeper.quorum</name> <value>master1</value> </property> </configuration>
6. Configure the slave nodes of the cluster:
hadoop@master1$ vi $HBASE_HOME/conf/regionservers slave1 slave2 slave3
7. Link the HDFS configuration file (hdfs-site.xml) to HBase's configuration folder (conf), so that HBase can see the HDFS's client configuration on your Hadoop cluster:
hadoop@master1$ ln -s $HADOOP_HOME/conf/hdfs-site.xml $HBASE_HOME/conf/hdfs-site.xml
8. Copy the
hadoop-core
andZookeeper
JAR file, and their dependencies, from your Hadoop and ZooKeeper installation:hadoop@master1$ rm -i $HBASE_HOME/lib/hadoop-core-*.jar hadoop@master1$ rm -i $HBASE_HOME/lib/ZooKeeper-*.jar hadoop@master1$ cp -i $HADOOP_HOME/hadoop-core-*.jar $HBASE_HOME/lib/ hadoop@master1$ cp -i $HADOOP_HOME/lib/commons-configuration-1.6.jar $HBASE_HOME/lib/ hadoop@master1$ cp -i $ZK_HOME/ZooKeeper-*.jar $HBASE_HOME/lib/
9. Sync all the HBase files under
/usr/local/hbase
frommaster
, to the same directory as client and slave nodes.10. Start the HBase cluster from the master node:
hadoop@master1$ $HBASE_HOME/bin/start-hbase.sh
11. Connect to your HBase cluster from the client node:
hadoop@client1$ $HBASE_HOME/bin/hbase shell
You can also access the HBase web UI from your browser. Make sure your master server's
60010
port is opened. The URL ishttp://master1:60010/master.jsp:
12. Stop the HBase cluster from the master node:
hadoop@master1$ $HBASE_HOME/bin/stop-hbase.sh
Our HBase cluster is configured to use /hbase
as its root directory on HDFS, by specifying the hbase.rootdir
property. Because it is the first time HBase was started, it will create the directory automatically. You can see the files HBase created on HDFS from the client:
hadoop@client1$ $HADOOP_HOME/bin/hadoop fs -ls /hbase
We want our HBase to run on distributed mode, so we set hbase.cluster.distributed
to true
in hbase-site.xml
.
We also set up the cluster to use an independent ZooKeeper ensemble by specifying HBASE_MANAGES_ZK=false
in hbase-env.sh
. The ZooKeeper ensemble is specified by the hbase.ZooKeeper.quorum
property. You can use clustered ZooKeeper by listing all the servers of the ensemble, such as zoo1,zoo2,zoo3
.
All region servers are configured in the $HBASE_HOME/conf/regionservers
file. You should use one line per region server. When starting the cluster, HBase will SSH into each region server configured here, and start the HRegionServer daemon on that server.
By linking hdfs-site.xml
under the $HBASE_HOME/conf
directory, HBase will use all the client configurations you made for your HDFS in hdfs-site.xml
, such as the dfs.replication
setting.
HBase ships with its prebuilt hadoop-core
and ZooKeeper JAR files. They may be out of date, compared to what you used in your Hadoop and ZooKeeper installation. Make sure HBase uses the same version of .jar
files with Hadoop and ZooKeeper, to avoid any unexpected problems.