Understanding dimensional reduction techniques
We looked at a lot of ways to visualize data in the previous sections, but high-dimensional data cannot be easily and accurately visualized in two dimensions. To achieve this, we need a projection of some sort or an embedding technique to embed the feature space in two dimensions. There are many linear and non-linear embedding techniques that you can use to produce two-dimensional projections of data. The following are the most common ones:
- Principal Component Analysis (PCA)
- Linear Discriminant Analysis (LDA)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Uniform Manifold Approximation and Projection (UMAP)
The following diagram shows the LDA and t-SNE embeddings for the 13-dimensional UCI Wine Recognition dataset (https://archive.ics.uci.edu/ml/datasets/wine). In the LDA embedding, we can see that all the classes should be linearly separable. That's a lot we have learned from using two lines of code...