Chapter 14

# Reducing Dimensionality

IN THIS CHAPTER

**Discovering the magic of singular value decomposition**

**Understanding the difference between factors and components**

**Automatically retrieving and matching images and text**

**Building a movie recommender system**

B*ig data* is defined as a collection of datasets so huge that the data becomes difficult to process using traditional techniques. The manipulation of big data differentiates statistical problems, which are based on small samples, from data science problems. You typically use traditional statistical techniques on small problems and data science techniques on big problems.

Data may be viewed as big because it consists of many examples, and this is the first kind of big that spontaneously comes to mind. Analyzing a database of millions of customers and interacting with them all simultaneously is really challenging, but that isn’t the only possible perspective of big data. Another view of big data is data dimensionality, which...