Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

Chapter 9. From Big to Small Data

Now that we have some cleansed data ready for analysis, let's first see how we can find our way around the high number of variables in our dataset. This chapter will introduce some statistical techniques to reduce the number of variables by dimension reduction and feature extraction, such as:

  • Principal Component Analysis (PCA)

  • Factor Analysis (FA)

  • Multidimensional Scaling (MDS) and a few other techniques

Note

Most dimension reduction methods require that two or more numeric variables in the dataset are highly associated or correlated, so the columns in our matrix are not totally independent of each other. In such a situation, the goal of dimension reduction is to decrease the number of columns in the dataset to the actual matrix rank; or, in other words, the number of variables can be decreased whilst most of the information content can be retained. In linear algebra, the matrix rank refers to the dimensions of the vector space generated by the matrix—or...