Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell

Extracting RBM components from MNIST

Let's now create our first RBM in scikit-learn. We will start by instantiating a module to extract 100 components from our MNIST dataset.

We will also set the verbose parameter to True to allow us visibility into the training process as well as the random_state parameter to 0. The random_state parameter is an integer that allows for reproducibility in code. It fixes the random number generator and sets the weights and biases randomly at the same time, every time. We finally let n_iter be 20. This is the number of iterations we wish to do, or back and forth passes of the network:

# instantiate our BernoulliRBM
 # we set a random_state to initialize our weights and biases to the same starting point
 # verbose is set to True to see the fitting period
 # n_iter is the number of back and forth passes
 # n_components (like PCA and LDA) represent the number of features to create
 # n_components can be any integer, less than , equal to, or greater than the original...