Book Image

Learning Hadoop 2

By : Gerald Turkington, GABRIELE MODENA
Book Image

Learning Hadoop 2

By: Gerald Turkington, GABRIELE MODENA

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
About the Authors
About the Reviewers


This chapter explored how to process those large volumes of data that we discussed so much in the previous chapter. In particular we covered:

  • How MapReduce was the only processing model available in Hadoop 1 and its conceptual model

  • The Java API to MapReduce, and how to use this to build some examples, from a word count to sentiment analysis of Twitter hashtags

  • The details of how MapReduce is implemented in practice, and we walked through the execution of a MapReduce job

  • How Hadoop stores data and the classes involved to represent input and output formats and record readers and writers

  • The limitations of MapReduce that led to the development of YARN, opening the door to multiple computational models on the Hadoop platform

  • The YARN architecture and how applications are built atop it

In the next two chapters, we will move away from strictly batch processing and delve into the world of near real-time and iterative processing, using two of the YARN-hosted frameworks we introduced in this chapter...