Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface

Choosing the right feature selection method


At this point, you may be feeling a bit overwhelmed with the information in this chapter. We have presented several ways of performing feature selection, some based on pure statistics and others based on the output of secondary machine learning models. It is natural to wonder how to decide which feature selection method is right for your data. In theory, if you are able to try multiple options, as we did in this chapter, that would be ideal, but we understand that it might not be feasible to do so. The following are some rules of thumbs that you can follow when you are trying to prioritize which feature selection module is more likely to offer greater results:

  • If your features are mostly categorical, you should start by trying to implement a SelectKBest with a Chi2 ranker or a tree-based model selector.
  • If your features are largely quantitative (like ours were), using linear models as model-based selectors and relying on correlations tends to yield...