Chapter 5: Working with Outliers | Python Feature Engineering Cookbook

Book Overview & Buying
Table Of Contents

Python Feature Engineering Cookbook - Second Edition

By : Galli

4.8 (16)

Buy this Book

Python Feature Engineering Cookbook

4.8 (16)

By: Galli

Buy this Book

Overview of this book

Feature engineering, the process of transforming variables and creating features, albeit time-consuming, ensures that your machine learning models perform seamlessly. This second edition of Python Feature Engineering Cookbook will take the struggle out of feature engineering by showing you how to use open source Python libraries to accelerate the process via a plethora of practical, hands-on recipes. This updated edition begins by addressing fundamental data challenges such as missing data and categorical values, before moving on to strategies for dealing with skewed distributions and outliers. The concluding chapters show you how to develop new features from various types of data, including text, time series, and relational databases. With the help of numerous open source Python libraries, you'll learn how to implement each feature engineering method in a performant, reproducible, and elegant manner. By the end of this Python book, you will have the tools and expertise needed to confidently build end-to-end and reproducible feature engineering pipelines that can be deployed into production.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the color images

Conventions used

Sections

Get in touch

Reviews

Share Your Thoughts

Download a Free PDF copy of this book

Chapter 1: Imputing Missing Data

Technical requirements

Removing observations with missing data

Performing mean or median imputation

Imputing categorical variables

Replacing missing values with an arbitrary number

Finding extreme values for imputation

Marking imputed values

Performing multivariate imputation by chained equations

Estimating missing data with nearest neighbors

Free Chapter

Chapter 2: Encoding Categorical Variables

Technical requirements

Creating binary variables through one-hot encoding

Performing one-hot encoding of frequent categories

Replacing categories with counts or the frequency of observations

Replacing categories with ordinal numbers

Performing ordinal encoding based on the target value

Implementing target mean encoding

Encoding with the Weight of Evidence

Grouping rare or infrequent categories

Performing binary encoding

Chapter 3: Transforming Numerical Variables

Transforming variables with the logarithm function

Transforming variables with the reciprocal function

Using the square root to transform variables

Using power transformations

Performing Box-Cox transformation

Performing Yeo-Johnson transformation

Chapter 4: Performing Variable Discretization

Technical requirements

Performing equal-width discretization

Implementing equal-frequency discretization

Discretizing the variable into arbitrary intervals

Performing discretization with k-means clustering

Implementing feature binarization

Using decision trees for discretization

Chapter 5: Working with Outliers

Technical requirements

Visualizing outliers with boxplots

Finding outliers using the mean and standard deviation

Finding outliers with the interquartile range proximity rule

Removing outliers

Capping or censoring outliers

Capping outliers using quantiles

Chapter 6: Extracting Features from Date and Time Variables

Technical requirements

Extracting features from dates with pandas

Extracting features from time with pandas

Capturing the elapsed time between datetime variables

Working with time in different time zones

Automating feature extraction with Feature-engine

Chapter 7: Performing Feature Scaling

Technical requirements

Standardizing the features

Scaling to the maximum and minimum values

Scaling with the median and quantiles

Performing mean normalization

Implementing maximum absolute scaling

Scaling to vector unit length

Chapter 8: Creating New Features

Technical requirements

Combining features with mathematical functions

Comparing features to reference variables

Performing polynomial expansion

Combining features with decision trees

Creating periodic features from cyclical variables

Creating spline features

Chapter 9: Extracting Features from Relational Data with Featuretools

Technical requirements

Setting up an entity set and creating features automatically

Creating features with general and cumulative operations

Combining numerical features

Extracting features from date and time

Extracting features from text

Creating features with aggregation primitives

Chapter 10: Creating Features from a Time Series with tsfresh

Technical requirements

Extracting features automatically from a time series

Creating and selecting features for a time series

Tailoring feature creation to different time series

Creating pre-selected features

Embedding feature creation in a scikit-learn pipeline

Chapter 11: Extracting Features from Text Variables

Technical requirements

Counting characters, words, and vocabulary

Estimating text complexity by counting sentences

Creating features with bag-of-words and n-grams

Implementing term frequency-inverse document frequency

Cleaning and stemming text variables

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a Free PDF copy of this book

Python Feature Engineering Cookbook - Second Edition

By : Galli

Python Feature Engineering Cookbook

By: Galli

Overview of this book

Working with Outliers

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access