Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface

Scikit-learn's PCA


As usual, scikit-learn saves the day by implementing this procedure in an easy to use transformer so that we don't have to go through that manual process each time we wish to use this powerful process:

  1. We can import it from scikit-learn's decomposition module:
# scikit-learn's version of PCA
from sklearn.decomposition import PCA
  1. To mimic the process we performed with the iris dataset, let's instantiate a PCA object with only two components:
# Like any other sklearn module, we first instantiate the class
pca = PCA(n_components=2)
  1. Now, we can fit our PCA to the data:
# fit the PCA to our data
pca.fit(iris_X)
  1. Let's take a look at some of the attributes of the PCA object to see if they match up with what we achieved in our manual process. Let's take a look at the components_ attribute of our object to see if this matches up without the top_2_eigenvectors variable:
pca.components_

array([[ 0.36158968, -0.08226889,  0.85657211,  0.35884393],
       [ 0.65653988,  0.72971237, -0.1757674...