Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell

Learning text features – word vectorizations

Our second example of feature learning will move away from images and towards text and natural language processing. When machines learn to read/write, they face a very large problem, context. In previous chapters, we have been able to vectorize documents by counting the number of words that appeared in each document and we fed those vectors into machine learning pipelines. By constructing new count-based features, we were able to use text in our supervised machine learning pipelines. This is very effective, up until a point. We are limited to only being to understand text as if they were only a Bag of Words (BOW). This means that we regard documents as being nothing more than a collection of words out of order.

What's more is that each word on its own has no meaning. It is only in a collection of other words that a document can have meaning when using modules such as CountVectorizer and TfidfVectorizer. It is for this reason that we will turn our...