Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface

Summary


To summarize our findings, both PCA and LDA are feature transformation tools in our arsenal that are used to find optimal new features to use. LDA specifically optimizes for class separation while PCA works in an unsupervised way to capture variance in the data in fewer columns. Usually, the two are used in conjunction with supervised pipelines, as we showed in the iris pipeline. In the final chapter, we will go through two longer case studies that utilize both PCA and LDA for text clustering and facial recognition software.

PCA and LDA are extremely powerful tools, but have limitations. Both of them are linear transformations, which means that they can only create linear boundaries and capture linear qualities in our data. They are also static transformations. No matter what data we input into a PCA or LDA, the output is expected and mathematical. If the data we are using isn't a good fit for PCA or LDA (they exhibit non-linear qualities, for example, they are circular), then the...