Learning Apache Mahout

Mahout started out primarily as a Java MapReduce package to run distributed and scalable machine learning algorithms on top of Hadoop. As the Mahout Project matures, it has taken a decision to move out of MapReduce and embrace Apache Spark and other distributed processing frameworks, such as H20, with a focus on write once and run on multiple platforms. In this chapter, we are going to discuss:

Limitations of MapReduce
Apache Spark
In-core binding
Out-of-core binding

MapReduce and HDFS were two paradigms largely responsible for a quantum shift in data processing capability. With increased capabilities, we learned to imagine larger problems that kick started a whole new industry of Big Data Analytics. The last decade has been amazing for solving data-related problems. However, in recent times, a lot of effort has been put into developing processing paradigms beyond MapReduce. These efforts are either aimed at replacing MapReduce or augmenting the processing framework...

Learning Apache Mahout

Learning Apache Mahout

Overview of this book

Related Content you might be interested in

Current Title:

Learning Apache Mahout

Business Intelligence Career Master Plan

Mastering NLP from Foundations to LLMs

Principles of Data Science

Chapter 8. New Paradigm in Mahout