Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Dimensionality reduction with singular value decomposition


Often, the original dimensions do not represent data in the best way possible. As we saw in PCA, you can, sometimes, project the data to fewer dimensions and still retain most of the useful information.

Sometimes, the best approach is to align dimensions along the features that exhibit most of the variations. This approach helps to eliminate dimensions that are not representative of the data.

Let's look at the following figure again, which shows the best-fit line on two dimensions:

The projection line shows the best approximation of the original data with one dimension. If we take the points where the gray line is intersecting with the black line and isolates the black line, we will have a reduced representation of the original data with as much variation retained as possible, as shown in the following figure:

Let's draw a line perpendicular to the first projection line, as shown in the following figure:

This line captures as much variation...