Book Image

Learning Apache Mahout

Book Image

Learning Apache Mahout

Overview of this book

Table of Contents (17 chapters)
Learning Apache Mahout
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
1
Introduction to Mahout
9
Case Study – Churn Analytics and Customer Segmentation
Index

Chapter 8. New Paradigm in Mahout

Mahout started out primarily as a Java MapReduce package to run distributed and scalable machine learning algorithms on top of Hadoop. As the Mahout Project matures, it has taken a decision to move out of MapReduce and embrace Apache Spark and other distributed processing frameworks, such as H20, with a focus on write once and run on multiple platforms. In this chapter, we are going to discuss:

  • Limitations of MapReduce

  • Apache Spark

  • In-core binding

  • Out-of-core binding

MapReduce and HDFS were two paradigms largely responsible for a quantum shift in data processing capability. With increased capabilities, we learned to imagine larger problems that kick started a whole new industry of Big Data Analytics. The last decade has been amazing for solving data-related problems. However, in recent times, a lot of effort has been put into developing processing paradigms beyond MapReduce. These efforts are either aimed at replacing MapReduce or augmenting the processing framework...