Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface

Recap of the levels of data


Understanding the various levels of data is necessary to perform feature engineering. When it comes time to build new features, or fix old ones, we must have ways of identifying how to work with every column.

Here is a quick table to summarize what is and isn't possible at every level:

Level of Measurement

Properties

Examples

Descriptive statistics

Graphs

Nominal

Discrete

Orderless

Binary Responses (True or False)

Names of People

Colors of paint

Frequencies/Percentages Mode

Bar

Pie

Ordinal

Ordered categories

Comparisons

Likert Scales

Grades on an exam

Frequencies

Mode

Median

Percentiles

Bar

Pie

Stem and leaf

Interval

Differences between ordered values have meaning

 

Deg. C or F

Some Likert Scales (must be specific)

Frequencies

Mode

Median

Mean

Standard Deviation

 

Bar PieStem and leaf

Box plot

Histogram

Ratio

Continuous

True 0 allows ratio statements (for example, $100 is twice as much as $50)

Money

Weight

Mean

Standard Deviation

 

Histogram

Box plot

The following is a table showing the types of statistics allowed...