Unsupervised cluster analysis refers to algorithms that aim at producing homogeneous groups of cases from unlabeled data. The algorithm doesn't know beforehand what the membership to the groups is, and its goal is to find the structure of the data from similarities (or differences) between the cases; a cluster is a group of cases, observations, individuals, or other units, that are similar to each other on the considered characteristics. These characteristics can be anything measurable or observable. The choice of characteristics, or attributes, is important as different attributes will lead to different clusters.
In this chapter, we will discuss the following topics:
Distance measures
Partition clustering with k-means, including the steps in the computations of clusters, and the selection of the best number of clusters
Applications of k-means clustering
Clustering algorithms use distance measures between the cases in order to create these homogeneous groups of cases...