Book Image

Rapid - Apache Mahout Clustering designs

Book Image

Rapid - Apache Mahout Clustering designs

Overview of this book

Table of Contents (16 chapters)
Apache Mahout Clustering Designs
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Mahout implementation of spectral clustering


The Mahout implementation of spectral clustering requires an affinity matrix as the input from the user, and it uses the K-means algorithm for the final clustering. Usually, Mahout clustering consists of the following steps:

  1. User takes a matrix of k*n-dimensional data to which he wants to cluster.

  2. User will have to create a similarity matrix from the original data matrix. This will be a k*k transformation of the original matrix based on how the points are related to each other.

  3. From the similarity matrix, an affinity matrix needs to be created. Mahout takes a type of Hadoop-backed affinity matrix as an input in the form of a text file. This is a weighted, undirected graph. Each line of a text file represents a single directional edge between two nodes. Each line consists of three comma separated values. The first value corresponds to the source node, second to the destination node, and third to the weight. As per the matrix, it will be represented...