Book Image

Optimizing Hadoop for MapReduce

By : Khaled Tannir
Book Image

Optimizing Hadoop for MapReduce

By: Khaled Tannir

Overview of this book

Table of Contents (15 chapters)

Performance tuning


The fundamental goal of performance tuning is to ensure that all available resources (CPU, RAM, I/O, and network) in a given cluster configuration are available to a particular job and are used in a balanced way.

Hadoop MapReduce resources are classified into categories such as computation, memory, network bandwidth, and input/output storage. If any of these resources perform badly, this will impact Hadoop's performance, which may cause your jobs to run slowly. Therefore, tuning Hadoop MapReduce performance is getting balanced resources on your Hadoop cluster and not just tuning one or more variables.

In simple words, tuning a Hadoop MapReduce job process consists of multiple analyses that investigate the Hadoop metrics and indicators in order to learn about execution time, memory amount used, and the number of bytes to read or store in the local filesystem, and so on.

Hadoop performance tuning is an iterative process. You launch a job, then analyzes Hadoop counters, adjust...