Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface

Summary


In this chapter, we learned a great deal about methodologies for selecting subsets of features in order to increase the performance of our machine learning pipelines in both a predictive capacity as well in-time-complexity.

The dataset that we chose had a relatively low number of features. If selecting, however, from a very large set of features (over a hundred), then the methods in this chapter will likely start to become entirely too cumbersome. We saw that in this chapter, when attempting to optimize a CountVectorizer pipeline, the time it would take to run a univariate test on every feature is not only astronomical; we would run a greater risk of experiencing multicollinearity in our features by sheer coincidence. 

In the next chapter, we will introduce purely mathematical transformations that we may apply to our data matrices in order to alleviate the trouble of working with vast quantities of features, or even a few highly uninterpretable features. We will begin to work with...