Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface

LDA versus PCA – iris dataset


Finally, we arrive at the moment where we can try using both PCA and LDA in our machine learning pipelines. Because we have been working with the iris dataset extensively in this chapter, we will continue to demonstrate the utility of both LDA and PCA as feature transformational pre-processing steps for supervised and unsupervised machine learning.

We will start with supervised machine learning and attempt to build a classifier to recognize the species of flower given the four quantitative flower traits:

  1. We begin by importing three modules from scikit-learn:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score

We will use KNN as our supervised model and the pipeline module to combine our KNN model with our feature transformation tools to create machine learning pipelines that can be cross-validated using the cross_val_score module. We will try a few different machine learning...