Book Image

Monitoring Hadoop

By : Aman Singh
Book Image

Monitoring Hadoop

By: Aman Singh

Overview of this book

Table of Contents (14 chapters)

Logging in Hadoop


In Hadoop, each daemon writes its own logs and the severity of logging is configurable. The logs in Hadoop can be related to the daemons or the jobs submitted. They are useful to troubleshoot slowness, issues with MapReduce tasks, connectivity issues, and platform bugs. The logs generated can be user level like task tracker logs on each node or can be related to master daemons such as NameNode and JobTracker.

In the newer YARN platform, there is a feature to move the logs to HDFS after initial logging. In Hadoop 1.x, the user log management is done using UserLogManager, which cleans and truncates logs according to retention and size parameters such as mapred.userlog.retain.hours and mapreduce.cluster.map.userlog.retain-size respectively. The tasks standard out and error are piped to the Unix tail program, so it retains the required size only.

These are some of the challenges of log management in Hadoop:

  • Excessive logging: The truncation of logs is not done till the tasks...