Rapid - Apache Mahout Clustering designs

Book Image

Rapid - Apache Mahout Clustering designs

Book Image

Rapid - Apache Mahout Clustering designs

Overview of this book

Apache Mahout Clustering Designs

Apache Mahout Clustering Designs

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Understanding Clustering

Understanding Clustering

The clustering concept

Understanding distance measures

Understanding different clustering techniques

Algorithm support in Mahout

Clustering algorithms in Mahout

Installing Mahout

Preparing data for use with clustering techniques

Understanding K-means Clustering

Understanding K-means Clustering

Learning K-means

Visualizing clusters

Understanding Canopy Clustering

Understanding Canopy Clustering

Running Canopy clustering on Mahout

Visualizing clusters

Working with CSV files

Understanding the Fuzzy K-means Algorithm Using Mahout

Understanding the Fuzzy K-means Algorithm Using Mahout

Learning Fuzzy K-means clustering

Visualizing clusters

Understanding Model-based Clustering

Understanding Model-based Clustering

Learning model-based clustering

Running LDA using Mahout

Understanding Streaming K-means

Understanding Streaming K-means

Learning Streaming K-means

Using Mahout for streaming K-means

Spectral Clustering

Spectral Clustering

Understanding spectral clustering

Mahout implementation of spectral clustering

Improving Cluster Quality

Improving Cluster Quality

Evaluating clusters

Using DistanceMeasure interface

Creating a Cluster Model for Production

Creating a Cluster Model for Production

Preparing the dataset

Launching the Mahout job on the cluster

Performance tuning for the job

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Summary

We discussed K-Clustering in this chapter. We also discussed how the K-means algorithm works and we used the Mahout implementation of K-means on a text dataset. We downloaded the data and converted it to a Mahout reusable vector format.

We discussed how to understand the cluster using the clusterdumper utility. We saw an example class to visualize the Mahout cluster as given in the Mahout example class.

Now, we will move on to the next chapter, where we will discuss Canopy clustering. This is also a very good technique and can be used to estimate the number of K for K-means clustering.