Book Image

Building a Recommendation System with R

Book Image

Building a Recommendation System with R

Overview of this book

Table of Contents (13 chapters)
Building a Recommendation System with R
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface
References
Index

Data preprocessing techniques


Data preprocessing is a crucial step for any data analysis problem. The model's accuracy depends mostly on the quality of the data. In general, any data preprocessing step involves data cleansing, transformations, identifying missing values, and how they should be treated. Only the preprocessed data can be fed into a machine-learning algorithm. In this section, we will focus mainly on data preprocessing techniques. These techniques include similarity measurements (such as Euclidean distance, Cosine distance, and Pearson coefficient) and dimensionality-reduction techniques, such as Principal component analysis (PCA), which are widely used in recommender systems. Apart from PCA, we have singular value decomposition (SVD), subset feature selection methods to reduce the dimensions of the dataset, but we limit our study to PCA.

Similarity measures

As discussed in the previous chapter, every recommender system works on the concept of similarity between items or users...