Monitoring HBase-related processes in the cluster is an important part of operating HBase. A basic monitoring is done by running health checks on the HBase processes and notifying the administrators if any process is down.
Nagios is a popular, open source monitoring software used to watch hosts, services, and resources, and alert users when something goes wrong and when it gets recovered again. Nagios can be easily extended by custom-modules, which are called plugins. The check_tcp
plugin is shipped with the Nagios installation. We can use this plugin to send a ping to a Hadoop/HBase daemon's RPC port, to check whether the daemon is alive.
In this recipe, we will set up a monitor server running Nagios to watch all the HBase-related processes in the entire cluster. We will configure Nagios to send us e-mail notifications if any Hadoop/HBase/ZooKeeper process is down.