This chapter explored Spark and showed you how it adds iterative processing as a new rich framework upon which applications can be built atop YARN. In particular, we highlighted:
The distributed data-structure-based processing model of Spark and how it allows very efficient in-memory data processing
The broader Spark ecosystem and how multiple additional projects are built atop it to specialize the computational model even further
In the next chapter we will explore Apache Pig and its programming language, Pig Latin. We will see how this tool can greatly simplify software development for Hadoop by abstracting away some of the MapReduce and Spark complexity.