The aim of this chapter is to discover trends in news articles by clustering, or grouping, them together. To do that, we will use the k-means algorithm, a classic machine learning algorithm originally developed in 1957.
Clustering is an unsupervised learning technique and we often use clustering algorithms for exploring data. Our dataset contains approximately 500 stories and it would be quite arduous to examine each of those stories individually. Using clustering allows us to group similar stories together, and we can explore the themes in each cluster independently.
Note
We use clustering techniques when we don't have a clear set of target classes for our data. In that sense, clustering algorithms have little direction in their learning. They learn according to some function, regardless of the underlying meaning of the data.
For this reason, it is critical to choose good features. In supervised learning, if you choose poor features, the learning algorithm can choose...