Dimensionality reduction is the process of reducing the number of dimensions or features. A lot of real data contains a very high number of features. It is not uncommon to have thousands of features. Now, we need to drill down to features that matter.
Dimensionality reduction serves several purposes such as:
Data compression
Visualization
When the number of dimensions is reduced, it reduces the disk footprint and memory footprint. Last but not least; it helps algorithms to run much faster. It also helps reduce highly correlated dimensions to one.
Humans can only visualize three dimensions, but data can have a much higher number of dimensions. Visualization can help find hidden patterns in the data. Dimensionality reduction helps visualization by compacting multiple features into one.
The most popular algorithm for dimensionality reduction is principal component analysis (PCA).
Let's look at the following dataset: