Book Image

Optimizing Hadoop for MapReduce

By : Khaled Tannir
Book Image

Optimizing Hadoop for MapReduce

By: Khaled Tannir

Overview of this book

Table of Contents (15 chapters)

Chapter 1. Understanding Hadoop MapReduce

MapReduce, the popular data-intensive distributed computing model is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation.

Hadoop is the most popular open source Java implementation of the Google's MapReduce programming model. It is already being used for large-scale data analysis tasks by many companies and is often used for jobs where low response time is critical.

Before going deep into MapReduce programming and Hadoop performance tuning, we will review the MapReduce model basics and learn about factors that affect Hadoop's performance.

In this chapter, we will cover the following:

  • The MapReduce model

  • An overview of Hadoop MapReduce

  • How MapReduce works internally

  • Factors that affect MapReduce performance