In this chapter, we discussed how to improve cluster quality. We looked at different measuring techniques that help us to identify cluster quality. We further discussed intrinsic and extrinsic methods for cluster evaluation techniques. Then, we saw how to use inter-cluster distance measure to calculate the Dunn index. We also discussed custom distance measure in Mahout. A wrong selection of distance measure can affect the quality of clusters badly. In the next, and final, chapter of this book, we will use Hadoop to run our clustering job and see how to go for clustering in production.
Rapid - Apache Mahout Clustering designs
Rapid - Apache Mahout Clustering designs
Overview of this book
Table of Contents (16 chapters)
Apache Mahout Clustering Designs
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Understanding Clustering
Understanding K-means Clustering
Understanding Canopy Clustering
Understanding the Fuzzy K-means Algorithm Using Mahout
Understanding Model-based Clustering
Understanding Streaming K-means
Spectral Clustering
Improving Cluster Quality
Creating a Cluster Model for Production
Index
Customer Reviews