Rapid - Apache Mahout Clustering designs

Cluster evaluation involves cluster validation. We can apply multiple algorithms to get the clustering results, and we wish to know how one result is better than the other.

Two types of methods are available to evaluate clusters:

Extrinsic methods
Intrinsic methods

Let's take a look at each of these types.

Extrinsic methods

Extrinsic methods are the methods in which data that is not used for clustering is used for evaluation. This data consists of known class labels and external benchmarks. These benchmarks are thought of as gold standards and are often created by experts. A measure on clustering quality is effective if it satisfies the following four criteria (A comparison of Extrinsic Clustering Evaluation Metrics based on Formal constraints, Enrique Amigó, Julio Gonzalo, Javier Artiles, and FelisaVerdejo):

Cluster Homogeneity: Clusters should not mix items belonging to different categories. Look at the following diagram:
Cluster 1 has all six data points in one cluster, while...

Rapid - Apache Mahout Clustering designs

Rapid - Apache Mahout Clustering designs

Overview of this book

Related Content you might be interested in

Current Title:

Rapid - Apache Mahout Clustering designs

Evaluating clusters

Extrinsic methods