So far in this chapter, we have looked at the basic concepts of supervised and unsupervised learning with the simplest possible examples. In these examples, we have considered a limited number of factors that contribute to the outcome. However, in the real world, we have a very large number of data points that are available for analysis and model generation. Every additional factor adds one dimension within the space, and beyond the third dimension, it becomes difficult to effectively visualize the data in a conceivable form. With each new dimension, there is a performance impact on the model generation exercise.
In the world of big data, where we now have the capability to bring in data from heterogeneous data sources, which was not possible earlier, we are constantly adding more dimensions to our datasets. While it is great to have additional data points and attributes to better understand a problem, more is not always better if we consider the computational...