Chapter 11. Working with Unlabeled Data – Clustering Analysis
In the previous chapters, we used supervised learning techniques to build machine learning models using data where the answer was already known—the class labels were already available in our training data. In this chapter, we will switch gears and explore cluster analysis, a category of unsupervised learning techniques that allows us to discover hidden structures in data where we do not know the right answer upfront. The goal of clustering is to find a natural grouping in data such that items in the same cluster are more similar to each other than those from different clusters.
Given its exploratory nature, clustering is an exciting topic and, in this chapter, you will learn about the following concepts that can help you to organize data into meaningful structures:
Finding centers of similarity using the popular k-means algorithm
Using a bottom-up approach to build hierarchical cluster trees
Identifying arbitrary shapes of objects...