Book Image

The Natural Language Processing Workshop

By : Rohan Chopra, Aniruddha M. Godbole, Nipun Sadvilkar, Muzaffar Bashir Shah, Sohom Ghosh, Dwight Gunning
5 (1)
Book Image

The Natural Language Processing Workshop

5 (1)
By: Rohan Chopra, Aniruddha M. Godbole, Nipun Sadvilkar, Muzaffar Bashir Shah, Sohom Ghosh, Dwight Gunning

Overview of this book

Do you want to learn how to communicate with computer systems using Natural Language Processing (NLP) techniques, or make a machine understand human sentiments? Do you want to build applications like Siri, Alexa, or chatbots, even if you’ve never done it before? With The Natural Language Processing Workshop, you can expect to make consistent progress as a beginner, and get up to speed in an interactive way, with the help of hands-on activities and fun exercises. The book starts with an introduction to NLP. You’ll study different approaches to NLP tasks, and perform exercises in Python to understand the process of preparing datasets for NLP models. Next, you’ll use advanced NLP algorithms and visualization techniques to collect datasets from open websites, and to summarize and generate random text from a document. In the final chapters, you’ll use NLP to create a chatbot that detects positive or negative sentiment in text documents such as movie reviews. By the end of this book, you’ll be equipped with the essential NLP tools and techniques you need to solve common business problems that involve processing text.
Table of Contents (10 chapters)
Preface

Introduction

In the previous chapter, we learned about the concepts of Natural Language Processing (NLP) and text analytics. We also took a quick look at various preprocessing steps. In this chapter, we will learn how to make text understandable to machine learning algorithms.

As we know, to use a machine learning algorithm on textual data, we need a numerical or vector representation of text data since most of these algorithms are unable to work directly with plain text or strings. But before converting the text data into numerical form, we will need to pass it through some preprocessing steps such as tokenization, stemming, lemmatization, and stop-word removal.

So, in this chapter, we will learn a little bit more about these preprocessing steps and how to extract features from the preprocessed text and convert them into vectors. We will also explore two popular methods for feature extraction (Bag of Words and Term Frequency-Inverse Document Frequency), as well as various methods...