Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Apache Mahout Essentials
  • Table Of Contents Toc
Apache Mahout Essentials

Apache Mahout Essentials

By : Jayani Withanawasam
3.7 (3)
close
close
Apache Mahout Essentials

Apache Mahout Essentials

3.7 (3)
By: Jayani Withanawasam

Overview of this book

If you are a Java developer or data scientist, haven't worked with Apache Mahout before, and want to get up to speed on implementing machine learning on big data, then this is the perfect guide for you.
Table of Contents (8 chapters)
close
close
7
Index

Apache Mahout

In this section, we will have a quick look at Apache Mahout.

Do you know how Mahout got its name?

Apache Mahout

As you can see in the logo, a mahout is a person who drives an elephant. Hadoop's logo is an elephant. So, this is an indicator that Mahout's goal is to use Hadoop in the right manner.

The following are the features of Mahout:

  • It is a project of the Apache software foundation
  • It is a scalable machine learning library
    • The MapReduce implementation scales linearly with the data
    • Fast sequential algorithms (the runtime does not depend on the size of the dataset)
  • It mainly contains clustering, classification, and recommendation (collaborative filtering) algorithms
  • Here, machine learning algorithms can be executed in sequential (in-memory mode) or distributed mode (MapReduce is enabled)
  • Most of the algorithms are implemented using the MapReduce paradigm
  • It runs on top of the Hadoop framework for scaling
  • Data is stored in HDFS (data storage) or in memory
  • It is a Java library (no user interface!)
  • The latest released version is 0.9, and 1.0 is coming soon
  • It is not a domain-specific but a general purpose library

Note

For those of you who are curious! What are the problems that Mahout is trying to solve? The following problems that Mahout is trying to solve:

The amount of available data is growing drastically.

The computer hardware market is geared toward providing better performance in computers. Machine learning algorithms are computationally expensive algorithms. However, there was no framework sufficient to harness the power of hardware (multicore computers) to gain better performance.

The need for a parallel programming framework to speed up machine learning algorithms.

Mahout is a general parallelization for machine learning algorithms (the parallelization method is not algorithm-specific).

No specialized optimizations are required to improve the performance of each algorithm; you just need to add some more cores.

Linear speed up with number of cores.

Each algorithm, such as Naïve Bayes, K-Means, and Expectation-maximization, is expressed in the summation form. (I will explain this in detail in future chapters.)

For more information, please read Map-Reduce for Machine Learning on Multicore, which can be found at http://www.cs.stanford.edu/people/ang/papers/nips06-mapreducemulticore.pdf.

Setting up Apache Mahout

Download the latest release of Mahout from https://mahout.apache.org/general/downloads.html.

If you are referencing Mahout as a Maven project, add the following dependency in the pom.xml file:

<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-core</artifactId>
  <version>${mahout.version}</version>
</dependency>

If required, add the following Maven dependencies as well:

<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-math</artifactId>
  <version>${mahout.version}</version>
</dependency>
<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-integration</artifactId>
  <version>${mahout.version}</version>
</dependency>

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

More details on setting up a Maven project can be found at http://maven.apache.org/.

Follow the instructions given at https://mahout.apache.org/developers/buildingmahout.html to build Mahout from the source.

The Mahout command-line launcher is located at bin/mahout.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Apache Mahout Essentials
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon