Data analysts often come across large datasets of unlabeled information with high dimensions/features and often seek to reduce the complexity of the data by applying Clustering or PCA.
Clustering is a data analysis technique used to discover groups of similar objects (close in terms of distance) or patterns in a dataset. Unlike supervised learning techniques (such as classification and regression), a clustering analysis does not use any labeled data, instead it uses the similarity between data features to group them into clusters. There are two standard clustering strategies: partitioning methods and hierarchical clustering.
PCA is a dimensionality reduction technique that transforms m-dimensional input space to n-dimensional (n < m) output space, with the objective to minimize the amount of information/variance lost by discarding (m - n) dimension.
This chapter...