Dealing with numerical features
In terms of numerical features (discrete and continuous), we can think of transformations that rely on the training data and others that rely purely on the observation being transformed.
Those that rely on the training data will use the train set to learn the necessary parameters during fit
, and then use them to transform any test or new data. The logic is pretty much the same as what we just reviewed for categorical features; however, this time, the encoder will learn different parameters.
On the other hand, those that rely purely on observations do not care about train or test sets. They will simply perform a mathematical computation on top of an individual value. For example, we could apply an exponential transformation to a particular variable by squaring its value. There is no dependency on learned parameters from anywhere – just get the value and square it.
At this point, you might be thinking about dozens of available transformations...