In Hadoop 1, the platform had two clear components: HDFS for data storage and MapReduce for data processing. The previous chapter described the evolution of HDFS in Hadoop 2 and in this chapter we'll discuss data processing.
The picture with processing in Hadoop 2 has changed more significantly than has storage, and Hadoop now supports multiple processing models as first-class citizens. In this chapter we'll explore both MapReduce and other computational models in Hadoop2. In particular, we'll cover:
What MapReduce is and the Java API required to write applications for it
How MapReduce is implemented in practice
How Hadoop reads data into and out of its processing jobs
YARN, the Hadoop2 component that allows processing beyond MapReduce on the platform
An introduction to several computational models implemented on YARN