Data standardization (or normalization) is important for a number of reasons:
- Some algorithms converge faster on standardized (or normalized) data
- If your input variables are on vastly different scales, the interpretability of coefficients might be hard or conclusions drawn might be wrong
- For some models, the optimal solution might be wrong if you do not standardize
In this recipe, we will show you how to standardize the data so if your modeling project requires standardized data, you will know how to do it.
To execute this recipe, you need to have a working Spark environment. You would have already gone through the previous recipe where we encoded the census data.
No other prerequisites are required.