Book Image

Monitoring Hadoop

By : Aman Singh
Book Image

Monitoring Hadoop

By: Aman Singh

Overview of this book

Table of Contents (14 chapters)

HDFS overview


HDFS is a distributed File System that has been designed for robustness by having multiple copies of blocks across the File System. The metadata for the File System is stored on NameNode and the actual data blocks are stored on DataNodes. For a healthy File System, the metadata must be consistent, DataNode blocks must be clean, and replication must be consistent. Let's look at each of these one by one and learn how they can be monitored. The protocol used for communication between NameNode and DataNodes is RPC, and the protocol used for data transfer is HDFS over HTTP.

  • HDFS checks: Hadoop natively provides the commands to verify the File System. The commands must be run by the user, with whom the HDFS is running. This is mostly HDFS, or you can have any other user. But do not run it as root. To run these commands, the PATH variable must be set and it must include the path to the Hadoop binaries.

    • hadoop dfsadmin –report: This command provides an exclusive report of the HDFS...