Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell

Using RBMs in a machine learning pipeline

Of course, we want to see how the RBM performs in our machine learning pipelines to not just visualize the workings of the model, but to see concrete results of the feature learning. To do this, we will create and run three pipelines:

  • A logistic regression model by itself running on the raw pixel strengths
  • A logistic regression running on extracted PCA components
  • A logistic regression running on extracted RBM components

Each of these pipelines will be grid-searched across a number of components (for PCA and RBM) and the C parameter for logistic regression. Let's start with our simplest pipeline. We will run the raw pixel values through a logistic regression to see if the linear model is enough to separate out the digits.

Using a linear model on raw pixel values

To begin, we will run the raw pixel values through a logistic regression model in order to obtain something of a baseline model. We want to see if utilizing PCA or RBM components will allow the...