# Executing dimensionality reduction

In the* Explaining feature engineering *section of *Chapter 2, Detecting Spam Emails*, we defined a *feature* of a ML problem as an attribute or a characteristic that describes it. Accumulating many features together creates a vector of attributes and each sample in a dataset is a unique combination of vector values. Consequently, adding more features to a specific problem implies increasing the vector’s dimensions. It is logical to think that having more features will provide a better description of the underlying data and alleviate the work of any ML algorithm that follows. But unfortunately, there are other implications.

In our discussion about **Support Vector Machines** (**SVM**) in *Chapter 2*, *Detecting Spam Emails*, we saw that each sample is a point in a high-dimensional space. More similar samples are closer than others and using the cosine similarity or Euclidean distance metrics, we can obtain their proximity. If we expand the dimensions...