In the previous chapter, we introduced the perceptron and described why it cannot effectively classify linearly inseparable data. Recall that we encountered a similar problem in our discussion on multiple linear regression; we examined a dataset in which the response variable was not linearly related to the explanatory variables. To improve the accuracy of the model, we introduced a special case of multiple linear regression called polynomial regression. We created synthetic combinations of features, and we were able to model a linear relationship between the response variable and the features in the higher dimensional feature space.
While this method of increasing the dimensions of the feature space may seem like a promising technique to use when approximating nonlinear functions with linear models, it suffers from two related problems. The first is a computational problem; computing the mapped features and working with larger vectors...