Cluster analysis or clustering is the process of grouping data into multiple groups so that the data in one group is similar to the data in other groups.
The following are a few examples where clustering is used:
Market segmentation: Dividing the target market into multiple segments so that the needs of each segment can be served better
Social network analysis: Finding a coherent group of people in the social network for ad targeting through a social networking site such as Facebook
Data center computing clusters: Putting a set of computers together to improve performance
Astronomical data analysis: Understanding astronomical data and events such as galaxy formations
Real estate: Identifying neighborhoods based on similar features
Text analysis: Dividing text documents, such as novels or essays, into genres
The k-means algorithm is best illustrated using imagery, so let's look at our sample figure again:
The first step in k-means is to randomly select two points called cluster...