Natural Language Processing with TensorFlow

Natural Language Processing with TensorFlow

By : Motaz Saad, Thushan Ganegedara

Buy this Book

Natural Language Processing with TensorFlow

By: Motaz Saad, Thushan Ganegedara

Buy this Book

Overview of this book

Natural language processing (NLP) supplies the majority of data available to deep learning applications, while TensorFlow is the most important deep learning framework currently available. Natural Language Processing with TensorFlow brings TensorFlow and NLP together to give you invaluable tools to work with the immense volume of unstructured data in today’s data streams, and apply these tools to specific NLP tasks. Thushan Ganegedara starts by giving you a grounding in NLP and TensorFlow basics. You'll then learn how to use Word2vec, including advanced extensions, to create word embeddings that turn sequences of words into vectors accessible to deep learning algorithms. Chapters on classical deep learning algorithms, like convolutional neural networks (CNN) and recurrent neural networks (RNN), demonstrate important NLP tasks as sentence classification and language generation. You will learn how to apply high-performance RNN models, like long short-term memory (LSTM) cells, to NLP tasks. You will also explore neural machine translation and implement a neural machine translator. After reading this book, you will gain an understanding of NLP and you'll have the skills to apply TensorFlow in deep learning NLP applications, and how to perform specific NLP tasks.

Natural Language Processing with TensorFlow

Contributors

Preface

Free Chapter

Introduction to Natural Language Processing

What is Natural Language Processing?

Tasks of Natural Language Processing

The traditional approach to Natural Language Processing

The deep learning approach to Natural Language Processing

The roadmap – beyond this chapter

Introduction to the technical tools

Summary

Understanding TensorFlow

What is TensorFlow?

Inputs, variables, outputs, and operations

Reusing variables with scoping

Implementing our first neural network

Summary

Word2vec – Learning Word Embeddings

What is a word representation or meaning?

Classical approaches to learning word representation

Word2vec – a neural network-based approach to learning word representation

The skip-gram algorithm

The Continuous Bag-of-Words algorithm

Summary

Advanced Word2vec

The original skip-gram algorithm

Comparing skip-gram with CBOW

Extensions to the word embeddings algorithms

More recent algorithms extending skip-gram and CBOW

GloVe – Global Vectors representation

Document classification with Word2vec

Summary

Sentence Classification with Convolutional Neural Networks

Introducing Convolution Neural Networks

Understanding Convolution Neural Networks

Exercise – image classification on MNIST with CNN

Using CNNs for sentence classification

Summary

Recurrent Neural Networks

Understanding Recurrent Neural Networks

Backpropagation Through Time

Applications of RNNs

Generating text with RNNs

Evaluating text results output from the RNN

Perplexity – measuring the quality of the text result

Recurrent Neural Networks with Context Features – RNNs with longer memory

Summary

Long Short-Term Memory Networks

Understanding Long Short-Term Memory Networks

How LSTMs solve the vanishing gradient problem

Other variants of LSTMs

Summary

Applications of LSTM – Generating Text

Our data

Implementing an LSTM

Comparing LSTMs to LSTMs with peephole connections and GRUs

Improving LSTMs – beam search

Improving LSTMs – generating text with words instead of n-grams

Using the TensorFlow RNN API

Summary

Applications of LSTM – Image Caption Generation

Getting to know the data

The machine learning pipeline for image caption generation

Extracting image features with CNNs

Implementation – loading weights and inferencing with VGG-

Learning word embeddings

Preparing captions for feeding into LSTMs

Generating data for LSTMs

Defining the LSTM

Evaluating the results quantitatively

Captions generated for test images

Using TensorFlow RNN API with pretrained GloVe word vectors

Summary

Sequence-to-Sequence Learning – Neural Machine Translation

Machine translation

A brief historical tour of machine translation

Understanding Neural Machine Translation

Preparing data for the NMT system

Training the NMT

Inference with NMT

The BLEU score – evaluating the machine translation systems

Implementing an NMT from scratch – a German to English translator

Training an NMT jointly with word embeddings

Improving NMTs

Attention

Other applications of Seq2Seq models – chatbots

Summary

Current Trends and the Future of Natural Language Processing

Current trends in NLP

Penetration into other research fields

Towards Artificial General Intelligence

NLP for social media

New tasks emerging

Newer machine learning models

Summary

References

Mathematical Foundations and Advanced TensorFlow

Basic data structures

Special types of matrices

Tensor/matrix operations

Probability

Introduction to Keras

Introduction to the TensorFlow seq2seq library

Visualizing word embeddings with TensorBoard

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

The roadmap – beyond this chapter

This section delineates the details of the rest of the book; it's brief, but has informative details about what each chapter of the book covers. In this book, we will be looking at numerous exciting fields of NLP, from algorithms that find word similarities without any sort of annotated data, to algorithms that can write a story by themselves.

Starting from the next chapter, we will dive into the details about several popular and interesting NLP tasks. In order to gain an in-depth knowledge and to make the learning interactive, various exercises are also provided. We will use Python and TensorFlow, an open-source library for distributed numerical computations, for all the implementations. TensorFlow encapsulates advance technicalities such as optimizing your code for GPUs using Compute Unified Device Architecture (CUDA), which can be challenging. Furthermore, TensorFlow provides built-in functions for implementing deep learning algorithms, for example, activations, stochastic optimization methods, and convolutions, making everyone's life easier.

We will embark on a journey that covers many hot topics of NLP and how they perform, while using TensorFlow to see the state-of-the-art algorithms in action. This is what we will look at in this book:

Chapter 2, Understanding TensorFlow, provides you with a sound guide to understand how to write client programs and run them in TensorFlow. This is important especially if you are new to TensorFlow, because TensorFlow behaves differently from a traditional coding language such as Python. This chapter will first offer an in-depth explanation about how TensorFlow executes a client. This will help you to understand the TensorFlow execution workflow and feel comfortable around TensorFlow terminology. Next, the chapter will walk you through various elements of a TensorFlow client such as defining variables, defining operations/functions, feeding inputs to an algorithm, and obtaining the results. We will finally discuss how all this knowledge of TensorFlow can be used to implement a moderately complex neural network to classify images of hand-written images.
Chapter 3, Word2vec – Learning Word Embeddings. The objective of this chapter is to introduce Word2vec—a method to learn numerical representations of words that reflects semantic of the words. But before diving straight into the Word2vec techniques, we will first discuss some classical approaches used to represent word semantics. One of the early approach was to rely on WordNet—a large lexical database. WordNet can be used to measure the semantic similarity between different words. However, maintaining such a large lexical database is costly. Therefore, there exist other simpler representation techniques, such as one-hot-encoded representations, and the term-frequency inverse document frequency method, that doesn't rely on external resources. Following this, we will move onto the modern way of learning word vectors known as Word2vec, where we use a neural network to learn word representations. We will discuss two popular Word2vec techniques: skip-gram and continuous bag-of-words (CBOW) model.
Chapter 4, Advanced Word2vec. We will start this chapter with several comparisons including a comparison between the skip-gram and CBOW algorithms to see if there is a clear-cut winner. Then we will discuss several extensions that have been introduced to the original Word2vec techniques over the course of the past few years. For example, ignoring common words in the text, such as "the" and "a", that have a high probability, improves the performance of the Word2vec models. On the other hand, the Word2vec model only considers the local context of a word and ignores the global statistics of the entire corpus. Consequently, a word embedding learning technique known as GloVe, which incorporates both global and local statistics in finding word vectors will be discussed.
Chapter 5, Sentence Classification with Convolution Neural Networks, introduces you to convolution neural networks (CNNs). Convolution networks are a powerful family of deep models that can leverage the spatial structure of an input to learn from data. In other words, a CNN can process images in their two-dimensional form, where a multilayer perceptron needs the image to be unwrapped to a one-dimensional vector. We will first discuss various operations that undergoes in CNNs, such as the convolution and pooling operations, in detail. Then we will see an example where we will learn to classify hand-written digit images with a CNN. Then we will transition into an application of CNNs in NLP. Precisely, we will be investigating how to apply a CNN to classify sentences, where the task is to classify if a sentence is about a person, location, object, and so on.
Chapter 6, Recurrent Neural Networks, focuses on introducing recurrent neural networks (RNNs) and using RNNs for language generation. RNNs are different from feed-forward neural networks (for example, CNNs) as RNNs have memory. The memory is stored as a continuously updated system state. We will start with a representation of a feed-forward neural network and modify that representation to learn from sequences of data instead of individual data points. This process will transform the feed-forward network to a RNN. This will be followed by a technical description about the exact equations used for computations within the RNN. Next, we will discuss the optimization process of RNNs that is used to update the RNN's weights. Thereafter we will iterate through different types of RNNs such as one-to-one RNNs and one-to-many RNNs. We will then walkthrough an exciting application of RNNs, where the RNN will learn to tell new stories by learning from a corpus of existing stories. We achieve this by training the RNN to predict the next word given the preceding sequence of words of the story. Finally, we will discuss a variant of standard RNNs, which we call RNN-CF (RNN with contextual features), and will compare it with the standard RNN to see which one performs better.
Chapter 7, Long Short-Term Memory Networks, discusses LSTMs by initially providing a solid intuition to how these models work and progressively diving into the technical details adequate to implement them on your own. Standard RNNs suffer from the crucial limitation of the inability to persist long-term memory. However, advanced RNN models (for example, long short-term memory cells (LSTMs) and gated recurrent units (GRUs)) have been proposed, which can remember sequences for large number of time steps. We will also examine how exactly does the LSTMs alleviate the problem of persisting long-term memory (this is known as the vanishing gradient problem). We will then discuss several improvements that can be used to improve LSTM models further such as predicting for several time steps ahead at once and reading sequences both forward and backward. Finally, we will discuss several variants of LSTM models such as GRUs and LSTMs with peephole connections.
Chapter 8, Applications of LSTM – Generating Text, explains how to implement LSTMs, GRUs, and LSTMs with peephole connections discussed in Chapter 7, Long Short-Term Memory Networks. Furthermore, we will compare the performance of these extensions both qualitatively and quantitatively. We will also discuss how to implement some of the extensions examined in Chapter 7, Long Short-Term Memory Networks such as predicting several time steps ahead (known as beam search) and using word vectors as inputs instead of one-hot-encoded representations. Finally, we will discuss how we can use the RNN API, which is a sub library of TensorFlow that simplifies the implementation of recurrent models.
Chapter 9, Applications of LSTM – Image Caption Generation, looks at another exciting application, where the model learns how to generate captions (that is, descriptions) for images using an LSTM and a CNN. This application is interesting because it shows us how to combine two different types of models as well as how to learn with multimodal data (for example, images and text). The specific way to achieve this is to first learn image representations (similar to word vectors) with the CNN and train the LSTM by feeding that image vector followed by the words of the description of the image as a sequence. We will first discuss how we can use a pretrained CNN to obtain the image representations. Then we will discuss how to learn the word embeddings. Next we will discuss how to feed the image vectors along with word embeddings to train the LSTM. This is followed by a description of different evaluation metrics that exist for evaluating image captioning systems. Afterwards, we will evaluate the captions generated by our model, both qualitatively and quantitatively. We will conclude the chapter with a guide of how to implement the same system using the TensorFlow RNN API.
Chapter 10, Sequence-to-Sequence Learning – Neural Machine Translation. Machine Translation has gained a lot of attention both due to the necessity of automating translation and the inherent difficulty of the task. We will start the chapter with a brief historical flashback of how machine translation was implemented in the early days. This discussion ends with an introduction to neural machine translation (NMT) systems. We will see how well current NMT systems are doing compared to old systems (such as statistical machine translation systems), which will motivate us to learn about NMT systems. Afterwards, we will discuss the intuition behind the design of NMT systems and continue with the technical details. Then we will discuss the evaluation metric we use to evaluate our system. Following this, we will investigate how we can implement a German to English translator from scratch. Next, we will learn about ways to improve NMT systems. We will look at one of those extensions in detail, called attention mechanism. Attention mechanism has become an essential in sequence to sequence learning problems. Finally, we will compare the performance improvement obtained with attention mechanism and analyze reasons behind the performance gain. This chapter concludes with a section on how the same concept of NMT systems can be extended to implement chatbots. Chatbots are systems that can communicate with humans and are used to fulfill various customer requests.
Chapter 11, Current Trends and the Future of Natural Language Processing. Natural language processing has branched out to a vast spectrum of different tasks. Here we will discuss some of the current trends and future developments of NLP we can expect in the future. We will first discuss various word embedding extensions that have emerged recently. We will also look at the implementation of one such embedding learning technique, known as tv-embeddings. Next, we will examine various trends growing in the field of neural machine translation. Then we will look at how NLP is combined with other fields such as computer vision and reinforcement learning to solve some interesting problems such as teaching computer agents to communicate by devising their own language. Another booming area these days is artificial general intelligence, which is about developing systems that can do multiple tasks (classify images, translate text, caption images, and so on) with a single system. We will investigate several such systems. Afterwards, we will talk about the introduction of NLP into mining social media. We will conclude this chapter with some of the new tasks emerging (for example, language grounding – developing common sense NLP systems) and new models (for example, phased LSTMs).
Appendix, Mathematical Foundations and Advanced TensorFlow, will introduce the reader to various mathematical data structures (for example, matrices) and operations (for example, matrix inverse). We will also discuss several important concepts in probability. We will then introduce Keras—a high-level library that uses TensorFlow underneath. Keras makes the implementing of neural networks simpler by hiding some of the details in TensorFlow, which some might find challenging. Concretely, we will see how we can implement a CNN with Keras, to get a feel of how to use Keras. Next, we will discuss how we can use the seq2seq library in TensorFlow to implement a neural machine translation system with much less code that we used in Chapter 11, Current Trends and the Future of Natural Language Processing. Finally, we will walk you through a guide aimed at teaching to use the TensorBoard to visualize word embeddings. TensorBoard is a handy visualization tool that is shipped with TensorFlow. This can be used to visualize and monitor various variables in your TensorFlow client.

Natural Language Processing with TensorFlow

By : Motaz Saad, Thushan Ganegedara

Natural Language Processing with TensorFlow

By: Motaz Saad, Thushan Ganegedara

Overview of this book

Related Content you might be interested in

Current Title:

Natural Language Processing with TensorFlow

Deep Learning Essentials

Hands-On Natural Language Processing with PyTorch 1.x

Recurrent Neural Networks with Python Quick Start Guide

The roadmap – beyond this chapter