Book Image

Learning Apache Mahout

Book Image

Learning Apache Mahout

Overview of this book

Table of Contents (17 chapters)
Learning Apache Mahout
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
1
Introduction to Mahout
9
Case Study – Churn Analytics and Customer Segmentation
Index

A Mahout Java example


We will now discuss how to use the clustering algorithm discussed in Java code. Open the MahoutClusteringExample.java file from the chapter7.src package.

k-means

Define the distance measure to be used by the k-means clustering algorithm:

DistanceMeasure measure = new EuclideanDistanceMeasure();

We create the Path variable to the input sequence directory created in the preprocessing step:

Path inputSeq = newPath("clustering_seq")

The next step is to generate the random initial cluster seeds. We create the output directory path, where we save the initial cluster points. The path constructor with two arguments creates a folder with the name of the second argument inside the directory of the first argument. You could use a separate directory for the initial cluster directory too:

Path clusters = newPath(inputSeq, "random-seeds")

The RandomSeedGenerator class has the buildRandom()function for that. It takes as input the Configuration object, the input directory with the sequence...