Cluster analysis or clustering is the process of grouping data into multiple groups so that the data in one group would be similar to the data in other groups.
The following are a few examples where clustering is used:
- Market segmentation: Dividing the target market into multiple segments so that the needs of each segment can be served better
- Social network analysis: Finding a coherent group of people in the social network for ad targeting through a social networking site, such as Facebook
- Data center computing clusters: Putting a set of computers together to improve performance
- Astronomical data analysis: Understanding astronomical data and events, such as galaxy formations
- Real estate: Identifying neighborhoods based on similar features
- Text analysis: Dividing text documents, such as novels or essays, into genres
The k-means algorithm is best illustrated using imagery, so let's look at our sample figure again:
The first step in k-means is to randomly select two points called...