In HBase ecosystem, it's must to monitor the cluster to control and improve their performance and states as it grows. As HBase sits on top of Hadoop ecosystem and serves real-time user traffic, it's essential to see the performance of the cluster at any given point of time, this allows us to detect the problem well in advance and take corrective actions before it happens.
It is important to know some of the details of Ganglia and its distributed components before we get into the details of managing clusters
This is an acronym for a low footprint service known as Ganglia Monitoring Daemon. This service needs to be installed at each node from where we want to pull the matrix. This daemon is the actual workhorse and collects the data of each host by listen/announce protocol. It also helps collect some of the core metrics such as disk, active process, network, memory, and CPU/VCPUs.
We will divide our how to do it into two sections. In the first section, we will talk about installing Ganglia on all the nodes.
Once it's done, we will do the integration with HBase so that the relevant metrics are available.
To install Ganglia it is best to use prebuild binary package that is available from the vendor distributions. This will help in dealing with the pre-requisites libraries. Alternatively, it can be downloaded from the Ganglia website, http://sourceforge.net/projects/ganglia/files/latest/download?source=files.
If you are using browser from command prompt, you can do it by using following command:
wget –o http://downloads.sourceforge.net/project/ganglia/\ganglia%20monitoring%20core/3.0.7%20%28Fossett%29/ganglia-3.0.7.tar.gz
When doing wget, use it as a single line on your shell. Use sudo in case you don't have privilege for the current directory or download it in /tmp and later on copy to the respective location.
tar –xzvf ganglia-3.0.7.tar.gz –c /opt/HBase B
rm –rf ganglia-3.0.7.tar.gz
// it will delete the tar file which is not needed now.Now let's Install the dependencies
sudo apt-get –y install build-essential libapr1-dev libconfuse-dev libexpat1-dev python-dev
The
-y
options means that apt-get won't wait for users confirmation. It will assumeyes
when question for confirmation would appear.Building and installing the downloaded and exploded binary:
cd /opt/HBase B/ganglia-3.0.7 ./configure --- is a configuration command on linux env make sudo make install
Once the preceding step is completed, you can generate a default configuration file by:
gmond --default_con fig > /etc/gmond.conf --use "sudo su - " in case there is a privilege issue sudo su – will make you a root user and will allow the system library to be accessed by the gmond.conf
vi /etc/gmond.conf
and change the following:globals { user=HBase gangila in place of above. }
The recommendation will be to create this user by the following commands:
sudo adduser --disabled-login --no-create-home ganglia cluster { name =HBase B --- name of your cluster will be used owner ="HBase B Company" url =http://yourHBase bMaster.ganglia-monitor.com/ --- url of the main monitor or the CNAME }
The UDP setup, which is the default setup, if good for fewer than 120 nodes. For more than 120 nodes, we have to switch to unicast.
The setup is as follows:
Change in /etc/gmond.conf Udp_send_channel { #mcast_join=--your IP address to join in host = yourHBase bMaster.ganglia-monitor.com post=8649 # ttl=1 } udp_recv_channel { #mcast_join=--your IP address to join in port =8649 # bind =--your IP address to join in }
Start the monitoring daemon with:
sudo gmond
We can test it
by nc <hostname> 8649
or telnet hostname 8649Now we have to install Ganglia meta daemon. It is good to have one if the cluster is less than 100 nodes. This is the workhorse and it will require powerful machine with decent compute power, as these are responsible for creating graphs.
Let's move ahead:
cd /u/HBase B/ganglia-3.0.7 ./configure –-with-gmetad make sudo make install sudo cp /u/HBase B/gangli-3.0.7/gmetad/gmetad.conf /etc/gmetad.conf
Open using
sudo vi /etc/gmrtad.conf
change the code:setuid_username "ganglia" data_source "HBase B" yourHBase bMaster.ganglia-monitor.com gridename "<our grid name say HBase B Grid>"
Now we need to create directories, which will store data in a round-robin database (rrds):
mkdir –p /var/lib/ganglia/rrds
Now let's change the ownership to ganglia users, so that it can read and write as needed.
chown –R ganglia:ganglia /var/lib/ganglia/
Let's start the daemon:
gmetad
Now, let's focus on Ganglia web.
sudo apt-get -y install rrdtool apache2 php5-mysql libapache2-mod-php5 php5-gd
Copy the PHP-based file to the following locations:
cp –r /u/HBase B/ganglia-3.0.7/web /var/www/ganglia sudo /etc/init.d/apache2 restart ( others which can be used are, status, stop )
Point
http:// HBase bMaster.ganglia-monitor.com/ganglia
, you should start seeing the basic graphs as the HBase setup is still not done.Integrate HBase and Ganglia:
vi /u/HBase B/HBase -0.98.5-hadoop2/conf /hadoop-metrics2-HBase .properties
Change the below parameter for getting different status on the ganglia:
HBase .extendedperiod = 3600 HBase .class= org.apache.hadoop.metrics2.sink.FileSink HBase .period=5 HBase .servers=master2:8649 # jvm context will provide memory used , thread count in JVM etc. jvm.class= org.apache.hadoop.metrics2.sink.FileSink jvm.period=5 # enable rpc context to see the metrics on each HBase rpc method invocation. jvm.servers=master2:8649 rpc.class= org.apache.hadoop.metrics2.sink.FileSink rpc.period=5 rpc.servers=master2:8649
Copy the
/u/HBase B/HBase B/HBase -0.98.5-hadoop2/conf/ hadoop-metrics2-HBase
.properties to all thenodes
and restartHMaster
and all the region servers:
As the system grows from a few nodes to the tens or hundreds or becomes a very large cluster having more than hundreds of nodes it's pivotal to have a holistic view, drill down view, historical view of the logs at any given point of time in a graphical representation. In a large or very large installation, administrators are more concerned about redundancy, which avoids single point of failure. HBase and underlying HDFS are designed to handle the node failures gracefully, but it's equally important to monitor these failure as this can lead to pull down the cluster if a corrective action is not taken in time. HBase exposes various matrix to JMX and Ganglia like HMaster, region servers statistics, JMV (Java virtual machines), RPC (Remote procedure calls), Hadoop/HDFS, MapReduce details. Taking into consideration all these points and various other salient and powerful features, we considered Ganglia.
Ganglia provides the following advantages:
It provides near-real-time monitoring for all the vital information of a very large cluster.
It runs on commodity hardware and can be suited for most of the popular OS.
Its open sourced and relatively easy to install.
It integrates easily with traditional monitoring systems
It provides an overall view of all nodes in a grid and all nodes in the cluster.
The monitored data is available in both textual and graphic format.
Works on multicast listen/announce protocol.
Works with open standards.
JSON
XML
XDR
RRDTool
APR – Apache portable runtime
Apache HTTPD server
PHP-based web interface
HBase works with only 3.0.X and higher version of Ganglia, hence we used 3.0.7 version.
In step 4, we installed the dependencies of libraries, which will be required for the ganglia to compile.
In step 5, we compiled ganglia and installed it by running the configure command, then we used make and then make install command.
In step 6, we created a file gmond.conf, and later on in step 7, we changed the setting to point to HBase master node. We also configured the port to 8649
with a user ganglia who can read from the cluster. By commenting the multicast address and the TTL (time to live), we also changed the UDP-based multicasting to which is a default one to unicasting, which enables us to expand the cluster to above 120 nodes. We also added a master Gmond node in this config file.
In step 8 we started the gmond and got some core monitoring such as CPU, disk, network, memory, and load average of the nodes.
In step 9, we went back to the /u/HBase B/ganglia-3.0.7/
and reran the configuration, but this time, we added configure –with-gmetad
, so that it complies with gmetad.
In step 11, we copied the gmetad.conf
from.
sudo
/u/HBase B/gangli-3.0.7/gmetad/gmetad.conf to /etc/gmetad.conf
.
In step 12, we added ganglia user and Master details in the data_source HBase B HBase bMaster.ganglia-monitor.com
.
In step 13/14, we create the rrds
directory that will hold the data in round-robin databases; later on, we stated the gmetad daemon on the master nodes.
In step 15, we installed all the dependency, which is required to run the web interface.
In step 16, we copied the web .php file from the existing location.
(
/u/HBase B/ganglia-3.0.7/web
) to (/var/www/ganglia
)
In step 17, we restarted the apache instance and saw all the basic graphs, which provides the details of the nodes and the host but not HBase details. We also copied it to all the nodes so that we have a similar configuration and the Ganglia master is getting the data from the child nodes.
In step 18, we changed the setting in hadoop-metrics2-HBase .properties
so that it starts collecting the metrics and starts sending it to the ganglia servers on port 8649
. The main class that is responsible for providing these details is org.apache.hadoop.metrics2.sink.FileSink
and it properties.
Now we point at the URL of master, and once the page is rendered, it starts showing the graphs as described by the image HBase -Ganglia-MasterAndRegion01-01.png
. It starts showing the following graphs:
Memory and CPU usage
JVM details (GC cycle, memory consumed by JVM, threads used, heap consumed, and so on)
HBase Master details
HBase Region compaction queue details
Region server flush queue utilizations
Region servers IO
Ganglia is used for monitoring very large cluster, and in the word of Hadoop/HBase , it can be very useful as it provides the following:
JVM
HDFS
Map reduce
Region compaction time
Region store files
Region block cache hit ratio
Master spilt size
Master split number of operations
Region block free
Name Node activities
Secondary name node details
Disk status