Rapid - Apache Mahout Clustering designs

Book Image

Rapid - Apache Mahout Clustering designs

Book Image

Rapid - Apache Mahout Clustering designs

Overview of this book

Apache Mahout Clustering Designs

Apache Mahout Clustering Designs

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Understanding Clustering

Understanding Clustering

The clustering concept

Understanding distance measures

Understanding different clustering techniques

Algorithm support in Mahout

Clustering algorithms in Mahout

Installing Mahout

Preparing data for use with clustering techniques

Understanding K-means Clustering

Understanding K-means Clustering

Learning K-means

Visualizing clusters

Understanding Canopy Clustering

Understanding Canopy Clustering

Running Canopy clustering on Mahout

Visualizing clusters

Working with CSV files

Understanding the Fuzzy K-means Algorithm Using Mahout

Understanding the Fuzzy K-means Algorithm Using Mahout

Learning Fuzzy K-means clustering

Visualizing clusters

Understanding Model-based Clustering

Understanding Model-based Clustering

Learning model-based clustering

Running LDA using Mahout

Understanding Streaming K-means

Understanding Streaming K-means

Learning Streaming K-means

Using Mahout for streaming K-means

Spectral Clustering

Spectral Clustering

Understanding spectral clustering

Mahout implementation of spectral clustering

Improving Cluster Quality

Improving Cluster Quality

Evaluating clusters

Using DistanceMeasure interface

Creating a Cluster Model for Production

Creating a Cluster Model for Production

Preparing the dataset

Launching the Mahout job on the cluster

Performance tuning for the job

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Summary

We discussed Canopy clustering in this chapter and found out how to get the initial number of clusters using Canopy clustering. We discussed how the Canopy clustering algorithm works and used the Mahout implementation of Canopy on a text dataset to generate Canopies. We discussed how Canopy clustering is implemented using the MapReduce method. We saw an example class to visualize the Mahout cluster as given in the mahout example class. We also discussed the code to change the CSV file to the vector format that is used by Mahout.

Now, we will move on to the next chapter, where we will discuss the Fuzzy K-means clustering algorithm. This is also a very good topic under clustering algorithms.