Book Image

Optimizing Hadoop for MapReduce

By : Khaled Tannir
Book Image

Optimizing Hadoop for MapReduce

By: Khaled Tannir

Overview of this book

Table of Contents (15 chapters)

Hadoop MapReduce metrics


Due to its scale and distributed nature, diagnosing the performance problems of Hadoop programs and monitoring a Hadoop system are inherently difficult. Although Hadoop system exports many textual metrics and logs, this information may be difficult to interpret and not fully understood by many application programmers.

Currently, Hadoop reports coarse-grained metrics about the performance of the whole system through logs and metrics API. Unfortunately, it lacks important metrics for per-job/per-task levels such as disk and network I/O utilization. In the case of running multiple jobs in a Hadoop system, it also lacks metrics to reflect the cluster resource utilization of each task. This results in difficulty for cluster administrators to measure their cluster utilization and set up the correct configuration of Hadoop systems.

Furthermore, logs generated by Hadoop can get excessively large, which makes it extremely difficult to handle them manually and can hardly answer...