Standardizing continuous variables
Building a machine learning model using features that have significantly different ranges and resolutions (such as age and salary) might pose not only computational problems, but also model-convergence and coefficient-interpretability problems.
In this recipe, we will learn how to standardize continuous variables so they have a mean of 0 and a standard deviation of 1.
Getting ready
To execute this recipe, you will need a working Spark environment. You will also have to have executed the previous recipe.
No other prerequisites are required.
How to do it...
To standardize the signal
column we introduced in the previous recipe, we will use the .StandardScaler(...)
method:
vec = feat.VectorAssembler( inputCols=['signal'] , outputCol='signal_vec' ) norm = feat.StandardScaler( inputCol=vec.getOutputCol() , outputCol='signal_norm' , withMean=True , withStd=True ) norm_pipeline = Pipeline(stages=[vec, norm]) signal_norm = ( norm_pipeline...