Natural Language Processing Fundamentals

By : Sohom Ghosh, Dwight Gunning

Natural Language Processing Fundamentals

By: Sohom Ghosh, Dwight Gunning

Overview of this book

If NLP hasn't been your forte, Natural Language Processing Fundamentals will make sure you set off to a steady start. This comprehensive guide will show you how to effectively use Python libraries and NLP concepts to solve various problems. You'll be introduced to natural language processing and its applications through examples and exercises. This will be followed by an introduction to the initial stages of solving a problem, which includes problem definition, getting text data, and preparing it for modeling. With exposure to concepts like advanced natural language processing algorithms and visualization techniques, you'll learn how to create applications that can extract information from unstructured data and present it as impactful visuals. Although you will continue to learn NLP-based techniques, the focus will gradually shift to developing useful applications. In these sections, you'll understand how to apply NLP techniques to answer questions as can be used in chatbots. By the end of this book, you'll be able to accomplish a varied range of assignments ranging from identifying the most suitable type of NLP task for solving a problem to using a tool like spacy or gensim for performing sentiment analysis. The book will easily equip you with the knowledge you need to build applications that interpret human language.

Preface

About the Book

Free Chapter

1. Introduction to Natural Language Processing

Introduction

History of NLP

Text Analytics and NLP

Various Steps in NLP

Kick Starting an NLP Project

Summary

2. Basic Feature Extraction Methods

Introduction

Types of Data

Cleaning Text Data

Feature Extraction from Texts

Feature Engineering

Summary

3. Developing a Text classifier

Introduction

Machine Learning

Developing a Text Classifier

Building Pipelines for NLP Projects

Saving and Loading Models

Summary

4. Collecting Text Data from the Web

Introduction

Collecting Data by Scraping Web Pages

Requesting Content from Web Pages

Dealing with Semi-Structured Data

Summary

5. Topic Modeling

Introduction

Topic Discovery

Topic Modeling Algorithms

Summary

6. Text Summarization and Text Generation

Introduction

What is Automated Text Summarization?

High-Level View of Text Summarization

TextRank

Summarizing Text Using Gensim

Summarizing Text Using Word Frequency

Generating Text with Markov Chains

Summary

7. Vector Representation

Introduction

Vector Definition

Why Vector Representations?

Summary

8. Sentiment Analysis

Introduction

Why is Sentiment Analysis Required?

Growth of Sentiment Analysis

Tools Used for Sentiment Analysis

TextBlob

Understanding Data for Sentiment Analysis

Training Sentiment Models

Summary

Appendix

1. Introduction to Natural Language Processing

2. Basic Feature Extraction Methods

3. Developing a Text classifier

4. Collecting Text Data from the Web

5. Topic Modeling

6. Text Summarization and Text Generation

7. Vector Representation

8. Sentiment Analysis

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Text Analytics and NLP

Text analytics is the method of extracting meaningful insights and answering questions from text data. This text data need not be a human language. Let's understand this with an example. Suppose you have a text file that contains your outgoing phone calls and SMS log data in the following format:

Figure 1.1: Format of call data

In the preceding figure, the first two fields represent the date and time at which the call was made or the SMS was sent. The third field represents the type of data. If the data is of the call type, then the value for this field will be set as voice_call. If the type of data is sms, the value of this field will be set to sms. The fourth field is for the phone number and name of the contact. If the number of the person is not in the contact list, then the name value will be left blank. The last field is for the duration of the call or text message. If the type of the data is voice_call, then the value in this field will be the duration of that call. If the type of data is sms, then the value in this field will be the text message.

The following figure shows records of call data stored in a text file:

Figure 1.2: Call records in a text file

Now, the data shown in the preceding figure is not exactly a human language. But it contains various information that can be extracted by analyzing it. A couple of questions that can be answered by looking at this data are as follows:

How many New Year greetings were sent by SMS on 1st January?
How many people were contacted whose name is not in the contact list?

The art of extracting useful insights from any given text data can be referred to as text analytics. NLP, on the other hand, is not just restricted to text data. Voice (speech) recognition and analysis also come under the domain of NLP. NLP can be broadly categorized into two types: Natural Language Understanding (NLU) and Natural Language Generation (NLG). A proper explanation of these terms is provided as follows:

NLU: NLU refers to a process by which an inanimate object with computing power is able to comprehend spoken language.
NLG: NLG refers to a process by which an inanimate object with computing power is able to manifest its thoughts in a language that humans are able to understand.

For example, when a human speaks to a machine, the machine interprets the human language with the help of the NLU process. Also, by using the NLG process, the machine generates an appropriate response and shares that with the human, thus making it easier for humans to understand. These tasks, which are part of NLP, are not part of text analytics. Now we will look at an exercise that will give us a better understanding of text analytics.

Exercise 1: Basic Text Analytics

In this exercise, we will perform some basic text analytics on the given text data. Follow these steps to implement this exercise:

Open a Jupyter notebook.
Insert a new cell. Assign a sentence variable with 'The quick brown fox jumps over the lazy dog'. Insert a new cell and add the following code to implement this:
```
sentence = 'The quick brown fox jumps over the lazy dog'
```
Check whether the word 'quick' belongs to that text using the following code:
```
'quick' in sentence
```
The preceding code will return the output 'True'.
Find out the index value of the word 'fox' using the following code:
```
sentence.index('fox')
```
The code will return the output 16.
To find out the rank of the word 'lazy', use the following code:
```
sentence.split().index('lazy')
```
The code generates the output 7.
For printing the third word of the given text, use the following code:
```
sentence.split()[2]
```
This will return the output 'brown'.
To print the third word of the given sentence in reverse order, use the following code:
```
sentence.split()[2][::-1]
```
This will return the output 'nworb'.

To concatenate the first and last words of the given sentence, use the following code:

words = sentence.split()first_word = words[0]last_word = words[len(words)-1]concat_word = first_word + last_word print(concat_word)

The code will generate the output 'Thedog'.

For printing words at even positions, use the following code:
```
[words[i] for i in range(len(words)) if i%2 == 0]
```
The code generates the following output:
Figure 1.3: List of words at even positions
To print the last three letters of the text, use the following code:
```
sentence[-3:]
```
This will generate the output 'dog'.
To print the text in reverse order, use the following code:
```
sentence[::-1]
```
The code generates the following output:
Figure 1.4: Text in reverse order
To print each word of the given text in reverse order, maintaining their sequence, use the following code:
```
print(' '.join([word[::-1] for word in words]))
```
The code generates the following output:

Figure 1.5: Printing the text in reverse order while preserving word sequence

We are now well acquainted with NLP. In the next section, let's dive deeper into the various steps involved in it.

Natural Language Processing Fundamentals

By : Sohom Ghosh, Dwight Gunning

Natural Language Processing Fundamentals

By: Sohom Ghosh, Dwight Gunning

Overview of this book

Related Content you might be interested in

Current Title:

Natural Language Processing Fundamentals

The Handbook of NLP with Gensim

Hands-On Python Natural Language Processing

Python Natural Language Processing Cookbook

Text Analytics and NLP

Figure 1.1: Format of call data

Figure 1.2: Call records in a text file

Exercise 1: Basic Text Analytics

Figure 1.3: List of words at even positions

Figure 1.4: Text in reverse order

Figure 1.5: Printing the text in reverse order while preserving word sequence