This chapter explored how to process those large volumes of data that we discussed so much in the previous chapter. In particular we covered:
How MapReduce was the only processing model available in Hadoop 1 and its conceptual model
The Java API to MapReduce, and how to use this to build some examples, from a word count to sentiment analysis of Twitter hashtags
The details of how MapReduce is implemented in practice, and we walked through the execution of a MapReduce job
How Hadoop stores data and the classes involved to represent input and output formats and record readers and writers
The limitations of MapReduce that led to the development of YARN, opening the door to multiple computational models on the Hadoop platform
The YARN architecture and how applications are built atop it
In the next two chapters, we will move away from strictly batch processing and delve into the world of near real-time and iterative processing, using two of the YARN-hosted frameworks we introduced in this chapter...