Book Image

Feature Engineering Made Easy

By : Sinan Ozdemir, Divya Susarla
Book Image

Feature Engineering Made Easy

By: Sinan Ozdemir, Divya Susarla

Overview of this book

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective. You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data. By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface

Case study 2 - predicting topics of hotel reviews data


Our second case study will take a look at hotel reviews data and attempt to cluster the reviews into topics. We will be employing a latent semantic analysis (LSA), which is a name given to the process of applying a PCA on sparse text document—term matricesIt is done to find latent structures in text for the purpose of classification and clustering. 

Applications of text clustering

Text clustering is the act of assigning different topics to pieces of text for the purpose of understanding what documents are about. Imagine a large hotel chain that gets thousands of reviews a week from around the world. Employees of the hotel would like to know what people are saying in order to have a better idea of what they are doing well and what can be improved.

Of course, the limiting factor here is the ability for humans to read all of these texts quickly and correctly. We can train machines to identify the types of things that people are talking about...