Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface

The four levels of data


We already know that we can identify data as being either qualitative or quantitative. But, from there, we can go further. The four levels of data are:

  • The nominal level
  • The ordinal level
  • The interval level
  • The ratio level

Each level comes with a varying level of control and mathematical possibilities. It is crucial to know which level data lives on because it will dictate the types of visualizations and operations you are allowed to perform.

The nominal level

The first level of data, the nominal level, has the weakest structure. It consists of data that are purely described by name. Basic examples include blood type (A, O, AB), species of animal, or names of people. These types of data are all qualitative.

Some other examples include:

  • In the SF Job Salary dataset, the Grade column would be nominal
  • Given visitor logs of a company, the first and last names of the visitors would be nominal
  • Species of animals in a lab experiment would be nominal

Mathematical operations allowed

At...