# Introducing the K-means algorithm

The **K-means** algorithm is a predominant unsupervised learning algorithm for clustering data due to its simplicity and efficiency. It aims to group similar items in the form of *K* clusters. After selecting *K* random centroids, it repeatedly moves them around to group the most similar samples to the center of each cluster. As a similarity measure, we can use metrics such as the Euclidean distance, cosine similarity (check the *Calculating vector similarity* section in *Chapter 2*, *Detecting Spam Emails*), Pearson correlation coefficients (discussed in the *Understanding the Pearson correlation* section of *Chapter 5*, *Recommending Music Titles*), and so forth. An example can help us to understand the algorithm better. Suppose that you are given the dataset shown in the upper-left plot of *Figure 10.3*:

Figure 10.3 – K-means basic steps

It’s straightforward to identify that the data points can be grouped into three clusters. Unfortunately...