Hands-On Natural Language Processing with PyTorch 1.x

By : Thomas Dop

Hands-On Natural Language Processing with PyTorch 1.x

By: Thomas Dop

Overview of this book

In the internet age, where an increasing volume of text data is generated daily from social media and other platforms, being able to make sense of that data is a crucial skill. With this book, you’ll learn how to extract valuable insights from text by building deep learning models for natural language processing (NLP) tasks. Starting by understanding how to install PyTorch and using CUDA to accelerate the processing speed, you’ll explore how the NLP architecture works with the help of practical examples. This PyTorch NLP book will guide you through core concepts such as word embeddings, CBOW, and tokenization in PyTorch. You’ll then learn techniques for processing textual data and see how deep learning can be used for NLP tasks. The book demonstrates how to implement deep learning and neural network architectures to build models that will allow you to classify and translate text and perform sentiment analysis. Finally, you’ll learn how to build advanced NLP models, such as conversational chatbots. By the end of this book, you’ll not only have understood the different NLP problems that can be solved using deep learning with PyTorch, but also be able to build models to solve them.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Essentials of PyTorch 1.x for NLP

Free Chapter

Chapter 1: Fundamentals of Machine Learning and Deep Learning

Overview of machine learning

Neural networks

NLP for machine learning

Summary

Chapter 2: Getting Started with PyTorch 1.x for NLP

Technical requirements

Installing and using PyTorch 1.x

Enabling PyTorch acceleration using CUDA

Comparing PyTorch to other deep learning frameworks

Building a simple neural network in PyTorch

NLP for PyTorch

Summary

Section 2: Fundamentals of Natural Language Processing

In this section, you will learn about the fundamentals of building a Natural Language Processing (NLP) application. You will also learn how to use various NLP techniques, such as word embeddings, CBOW, and tokenization in PyTorch in this section.

Chapter 3: NLP and Text Embeddings

Technical requirements

Tagging and chunking for parts of speech

TF-IDF

Summary

Chapter 4: Text Preprocessing, Stemming, and Lemmatization

Technical requirements

Text preprocessing

Stemming and lemmatization

Uses of stemming and lemmatization

Summary

Section 3: Real-World NLP Applications Using PyTorch 1.x

Chapter 5: Recurrent Neural Networks and Sentiment Analysis

Technical requirements

Building RNNs

Introducing LSTMs

Building a sentiment analyzer using LSTMs

Deploying the application on Heroku

Summary

Chapter 6: Convolutional Neural Networks for Text Classification

Technical requirements

Exploring CNNs

Building a CNN for text classification

Summary

Chapter 7: Text Translation Using Sequence-to-Sequence Neural Networks

Technical requirements

Theory of sequence-to-sequence models

Building a sequence-to-sequence model for text translation

Next steps

Summary

Chapter 8: Building a Chatbot Using Attention-Based Neural Networks

Technical requirements

The theory of attention within neural networks

Building a chatbot using sequence-to-sequence neural networks with attention

Summary

Chapter 9: The Road Ahead

Exploring state-of-the-art NLP machine learning

Future NLP tasks

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Tokenization

Next, we will learn about tokenization for NLP, a way of pre-processing text for entry into our models. Tokenization splits our sentences up into smaller parts. This could involve splitting a sentence up into its individual words or splitting a whole document up into individual sentences. This is an essential pre-processing step for NLP that can be done fairly simply in Python:

We first take a basic sentence and split this up into individual words using the word tokenizer in NLTK:
```
text = 'This is a single sentence.'
tokens = word_tokenize(text)
print(tokens)
```
This results in the following output:
Figure 3.18 – Splitting the sentence
Note how a period (.) is considered a token as it is a part of natural language. Depending on what we want to do with the text, we may wish to keep or dispose of the punctuation:
```
no_punctuation = [word.lower() for word in tokens if word.isalpha()]
print(no_punctuation)
```
This results in the following output:
Figure 3.19...

Hands-On Natural Language Processing with PyTorch 1.x

By : Thomas Dop

Hands-On Natural Language Processing with PyTorch 1.x

By: Thomas Dop

Overview of this book

Related Content you might be interested in

Current Title:

Hands-On Natural Language Processing with PyTorch 1.x

Hands-On Python Natural Language Processing

Hands-On Natural Language Processing with Python

Natural Language Processing with TensorFlow.

Tokenization