Book Image

Rapid - Apache Mahout Clustering designs

Book Image

Rapid - Apache Mahout Clustering designs

Overview of this book

Table of Contents (16 chapters)
Apache Mahout Clustering Designs
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 8. Improving Cluster Quality

In the previous chapters, we discussed different clustering techniques and techniques available in Mahout. In this chapter, we will focus on how to evaluate whether our algorithm has performed well or not. First, we will have to understand how our cluster is working, and then we can see where we can improve our cluster. The output of the clustering algorithm is affected by the algorithm, input parameters, and other parameters. The basic idea behind improving cluster quality is to change different parameters, such as the distance measure or input matrix, or check the other parameters that are passed to the input algorithms. So, while we evaluate the cluster, we basically perform the following tasks:

  • Measuring cluster quality

  • Finding the number of clusters in the given dataset

  • Finding out how changing distance measure can affect the cluster quality

We will discuss the following topics in this chapter:

  • Evaluating clusters

  • Improving cluster quality