Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

Apache Mahout

Apache Mahout is a scalable machine learning library. It is an open source library under the Apache Software Foundation. It supports algorithms for clustering, classification, and collaborative filtering on distributed platforms. Apache Mahout welcomes contributors to contribute any algorithm to the library. The algorithm coded may not always be distributed and can run on a single machine as well.

Tip

As Apache Mahout allows developers to introduce single-machine algorithms, it is recommended that you study the implementation before running it on Hadoop.

Apache Mahout has a few algorithms that are implemented as MapReduce. These algorithms can be run in Hadoop to exploit the parallelism on a distributed cluster. Again, a word of caution for you is to study the implementation of an algorithm before using it in your Hadoop deployments. A non-MapReduce algorithm may not yield any speedup when run on a Hadoop cluster.

Tip

In a recent change, since April 2014, Mahout has stopped accepting...