Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell

Encoding categorical variables

To recap, thus far we have successfully imputed our dataset—both our categorical and quantitative columns. At this point, you may be wondering, how do we utilize the categorical data with a machine learning algorithm?

Simply put, we need to transform this categorical data into numerical data. So far, we have ensured that the most common category was used to fill the missing values. Now that this is done, we need to take it a step further. 

Any machine learning algorithm, whether it is a linear-regression or a KNN-utilizing Euclidean distance, requires numerical input features to learn from. There are several methods we can rely on to transform our categorical data into numerical data.

Encoding at the nominal level

Let's begin with data at the nominal level. The main method we have is to transform our categorical data into dummy variables. We have two options to do this:

  • Utilize pandas to automatically find the categorical variables and dummy code them
  • Create our...