Book Image

Learning Apache Mahout

Book Image

Learning Apache Mahout

Overview of this book

Table of Contents (17 chapters)
Learning Apache Mahout
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
1
Introduction to Mahout
9
Case Study – Churn Analytics and Customer Segmentation
Index

Moving beyond MapReduce


Let's discuss why we need to move beyond MapReduce. Based on the scenario and use case, there are many advantages and limitations of MapReduce. In this section, we will concern ourselves with the limitations that impact machine learning use cases.

Firstly, MapReduce is not feasible when the intermediate processes need to talk to each other. A lot of machine learning algorithms need to work based on a shared global state, which is difficult to implement with MapReduce.

Secondly, quite a few problems are difficult to break down into map and reduce phases. Mahout is porting to Apache Spark, which works on top of HDFS and provides a processing paradigm other than MapReduce.