Book Image

Optimizing Hadoop for MapReduce

By : Khaled Tannir
Book Image

Optimizing Hadoop for MapReduce

By: Khaled Tannir

Overview of this book

Table of Contents (15 chapters)

Optimizing mappers and reducers code


Optimizing MapReduce code-side performance in detail exceeds the scope of this book. In this section, we will provide a basic guideline with some rules to contribute to the improvement of your MapReduce job performance.

One of the important features of Hadoop is that all data is processed in a unit known as records. While records have almost the same size, theoretically, the time to process such records should be the same. However, in practice, the processing time of records within a task vary significantly and slowness may appear when reading a record from memory, processing the record, or writing the record to memory. Moreover, in practice, two other factors may affect the mapper or reducer performance: I/O access time and spill, and overhead waiting time resulting from heavy I/O requests.

Note

Efficiency is measurable and quantitatively determined by the ratio of output to input.

MapReduce provides ease of use while a programmer defines his job with only...