Rapid - Apache Mahout Clustering designs

Book Image

Rapid - Apache Mahout Clustering designs

Book Image

Rapid - Apache Mahout Clustering designs

Overview of this book

Apache Mahout Clustering Designs

Apache Mahout Clustering Designs

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Understanding Clustering

Understanding Clustering

The clustering concept

Understanding distance measures

Understanding different clustering techniques

Algorithm support in Mahout

Clustering algorithms in Mahout

Installing Mahout

Preparing data for use with clustering techniques

Understanding K-means Clustering

Understanding K-means Clustering

Learning K-means

Visualizing clusters

Understanding Canopy Clustering

Understanding Canopy Clustering

Running Canopy clustering on Mahout

Visualizing clusters

Working with CSV files

Understanding the Fuzzy K-means Algorithm Using Mahout

Understanding the Fuzzy K-means Algorithm Using Mahout

Learning Fuzzy K-means clustering

Visualizing clusters

Understanding Model-based Clustering

Understanding Model-based Clustering

Learning model-based clustering

Running LDA using Mahout

Understanding Streaming K-means

Understanding Streaming K-means

Learning Streaming K-means

Using Mahout for streaming K-means

Spectral Clustering

Spectral Clustering

Understanding spectral clustering

Mahout implementation of spectral clustering

Improving Cluster Quality

Improving Cluster Quality

Evaluating clusters

Using DistanceMeasure interface

Creating a Cluster Model for Production

Creating a Cluster Model for Production

Preparing the dataset

Launching the Mahout job on the cluster

Performance tuning for the job

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Algorithm support in Mahout

The implementation of algorithms in Mahout can be categorized into two groups:

Sequential algorithms: These algorithms are executed sequentially and so cannot use Hadoop's scalable processing. These algorithms are usually the ones derived from Taste (this was a separate project. It was a non Hadoop based recommendation engine).
Examples of these algorithms are user-based collaborative filtering, logistic regression, Hidden Markov Model, multi-layer perceptron, and singular value decomposition.
Parallel algorithms: These algorithms can support petabytes of data using Hadoop's map reduce parallel processing.
Examples of these algorithms are Random Forest, Naïve Bayes, Canopy clustering, K-means clustering, spectral clustering, and so on.