Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 3. Processing – MapReduce and Beyond

In Hadoop 1, the platform had two clear components: HDFS for data storage and MapReduce for data processing. The previous chapter described the evolution of HDFS in Hadoop 2 and in this chapter we'll discuss data processing.

The picture with processing in Hadoop 2 has changed more significantly than has storage, and Hadoop now supports multiple processing models as first-class citizens. In this chapter we'll explore both MapReduce and other computational models in Hadoop2. In particular, we'll cover:

  • What MapReduce is and the Java API required to write applications for it

  • How MapReduce is implemented in practice

  • How Hadoop reads data into and out of its processing jobs

  • YARN, the Hadoop2 component that allows processing beyond MapReduce on the platform

  • An introduction to several computational models implemented on YARN