Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface

Quantitative versus qualitative data


To accomplish our diagnoses of the various types of data, we will begin with the highest order of separation. When dealing with structured, tabular data (which we usually will be doing), the first question we generally ask ourselves is whether the values are of a numeric or categorical nature.

Quantitative data are data that are numerical in nature. They should be measuring the quantity of something.

Qualitative data are data that are categorical in nature. They should be describing the quality of something.

Basic examples:

  • Weather measured as temperature in Fahrenheit or Celsius would be quantitative
  • Weather measured as cloudy or sunny would be qualitative
  • The name of a person visiting the White House would be qualitative
  • The amount of blood you donate at a blood drive is quantitative

The first two examples show that we can describe similar systems using data from both the qualitative and quantitative side. In fact, in most datasets, we will be working with...