In Hadoop, each daemon writes its own logs and the severity of logging is configurable. The logs in Hadoop can be related to the daemons or the jobs submitted. They are useful to troubleshoot slowness, issues with MapReduce tasks, connectivity issues, and platform bugs. The logs generated can be user level like task tracker logs on each node or can be related to master daemons such as NameNode and JobTracker.
In the newer YARN platform, there is a feature to move the logs to HDFS after initial logging. In Hadoop 1.x, the user log management is done using UserLogManager, which cleans and truncates logs according to retention and size parameters such as mapred.userlog.retain.hours
and mapreduce.cluster.map.userlog.retain-size
respectively. The tasks standard out and error are piped to the Unix tail program, so it retains the required size only.
These are some of the challenges of log management in Hadoop: