PRE-MODEL ALGORITHMS
As an extension of the data scrubbing process, unsupervised learning algorithms are sometimes used in advance of a supervised learning algorithm to prepare the data for prediction modeling. In this way, unsupervised algorithms are used to clean or reshape the data rather than to derive actionable insight.
Examples of pre-model algorithms include dimension reduction techniques, as introduced in the previous chapter, as well as k-means clustering. Both of these algorithms are examined in this chapter.
Principal Component Analysis
One of the most popular dimension reduction techniques is principal component analysis (PCA). Known also as general factor analysis, PCA is useful for dramatically reducing data complexity and visualizing data in fewer dimensions. The practical goal of PCA is to find a low-dimensional representation of the dataset that preserves as much of the original variation as possible. Rather than removing individual features from...