Now we're going to present two very important scikit-learn classes that can help the machine learning engineer to create complex processing structures including all the steps needed to generate the desired outcomes from the raw datasets.
scikit-learn provides a flexible mechanism for creating pipelines made up of subsequent processing steps. This is possible thanks to a standard interface implemented by the majority of classes therefore most of the components (both data processors/transformers and classifiers/clustering tools) can be exchanged seamlessly. The class Pipeline
accepts a single parameter steps
, which is a list of tuples in the form (name of the component—instance), and creates a complex object with the standard fit/transform interface. For example, if we need to apply a PCA, a standard scaling, and then we want to classify using a SVM, we could create a pipeline in the following way:
from sklearn.decomposition import...