Book Image

Hands-On Python Natural Language Processing

By : Aman Kedia, Mayank Rasu
4 (1)
Book Image

Hands-On Python Natural Language Processing

4 (1)
By: Aman Kedia, Mayank Rasu

Overview of this book

Natural Language Processing (NLP) is the subfield in computational linguistics that enables computers to understand, process, and analyze text. This book caters to the unmet demand for hands-on training of NLP concepts and provides exposure to real-world applications along with a solid theoretical grounding. This book starts by introducing you to the field of NLP and its applications, along with the modern Python libraries that you'll use to build your NLP-powered apps. With the help of practical examples, you’ll learn how to build reasonably sophisticated NLP applications, and cover various methodologies and challenges in deploying NLP applications in the real world. You'll cover key NLP tasks such as text classification, semantic embedding, sentiment analysis, machine translation, and developing a chatbot using machine learning and deep learning techniques. The book will also help you discover how machine learning techniques play a vital role in making your linguistic apps smart. Every chapter is accompanied by examples of real-world applications to help you build impressive NLP applications of your own. By the end of this NLP book, you’ll be able to work with language data, use machine learning to identify patterns in text, and get acquainted with the advancements in NLP.
Table of Contents (16 chapters)
1
Section 1: Introduction
4
Section 2: Natural Language Representation and Mathematics
9
Section 3: NLP and Learning

One-hot vectorization

In general, a one-hot vector is used to represent categorical variables that take in values from a predefined list of values. These help in representing tokens as vectors that are required in certain use cases. In such vectors, all values are 0 except the one where the token is present, and this entry is marked 1. As you may have guessed, these are binary vectors.

For example, weather can be represented as a categorical variable with the values hot and cold. In this scenario, the one-hot vectors would be as follows:

vec(hot)  = <0, 1>
vec(cold) = <1, 0>

There are two bits in here—the second bit is 1, to denote hot, and the first bit is 1, to denote cold. The size of the vector is 2 since there are only two possibilities available in terms of hot and cold.

Hey! Where does this work similarly in NLP?

In NLP, each of the terms present in the vocabulary can be thought of as a category, just as we had two categories to represent weather conditions. Now...