Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface

Dealing with missing values in a dataset


When working with data, one of the most common issues a data scientist will run into is the problem of missing data. Most commonly, this refers to empty cells (row/column intersections) where the data just was not acquired for whatever reason. This can become a problem for many reasons; notably, when applying learning algorithms to data with missing values, most (not all) algorithms are not able to cope with missing values. 

For this reason, data scientists and machine learning engineers have many tricks and tips on how to deal with this problem. Although there are many variations of methodologies, the two major ways in which we can deal with missing data are:

  • Remove rows with missing values in them
  • Impute (fill in) missing values

Each method will clean our dataset to a point where a learning algorithm can handle it, but each method will have its pros and cons.

First off, before we go too far, let's get rid of the zeros and replace them all with the value...