Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell

Examining our dataset

For demonstrative purposes, in this chapter, we will utilize a dataset that we have created, so that we can showcase a variety of data levels and types. Let's set up our DataFrame and dive into our data.

We will use pandas to create the DataFrame we will work with, as this is the primary data structure in pandas. The advantage of a pandas DataFrame is that there are several attributes and methods available for us to perform on our data. This allows us to logically manipulate the data to develop a thorough understanding of what we are working with, and how best to structure our machine learning models:

  1. First, let's import pandas:
# import pandas as pd
  1. Now, we can set up our DataFrame X. To do this, we will utilize the DataFrame method in pandas, which creates a tabular data structure (table with rows and columns). This method can take in a few types of data (NumPy arrays or dictionaries, to name a couple). Here, we will be passing-in a dictionary with keys as column headers...