In Chapter 3, Multiple Regression in Action, inside the feature scaling section, we discussed how changing your original variables to a similar scale could help better interpret the resulting regression coefficients. Moreover, scaling is essential when using gradient descent-based algorithms because it facilitates quicker converging to a solution. For gradient descent, we will introduce other techniques that can only work using scaled features. However, apart for the technical requirements of certain algorithms, now our intention is to draw your attention to how feature scaling can be helpful when working with data that can sometimes be missing or faulty.
Missing or wrong data can happen not just during training but also during the production phase. Now, if a missing value is encountered, you have two design options to create a model sufficiently robust to cope with such a problem:
Actively deal with the missing values (there is a paragraph in this chapter devoted to...