Natural Language Processing with TensorFlow

Natural Language Processing with TensorFlow

By : Motaz Saad, Thushan Ganegedara

Buy this Book

Natural Language Processing with TensorFlow

By: Motaz Saad, Thushan Ganegedara

Buy this Book

Overview of this book

Natural language processing (NLP) supplies the majority of data available to deep learning applications, while TensorFlow is the most important deep learning framework currently available. Natural Language Processing with TensorFlow brings TensorFlow and NLP together to give you invaluable tools to work with the immense volume of unstructured data in today’s data streams, and apply these tools to specific NLP tasks. Thushan Ganegedara starts by giving you a grounding in NLP and TensorFlow basics. You'll then learn how to use Word2vec, including advanced extensions, to create word embeddings that turn sequences of words into vectors accessible to deep learning algorithms. Chapters on classical deep learning algorithms, like convolutional neural networks (CNN) and recurrent neural networks (RNN), demonstrate important NLP tasks as sentence classification and language generation. You will learn how to apply high-performance RNN models, like long short-term memory (LSTM) cells, to NLP tasks. You will also explore neural machine translation and implement a neural machine translator. After reading this book, you will gain an understanding of NLP and you'll have the skills to apply TensorFlow in deep learning NLP applications, and how to perform specific NLP tasks.

Natural Language Processing with TensorFlow

Contributors

Preface

Free Chapter

Introduction to Natural Language Processing

What is Natural Language Processing?

Tasks of Natural Language Processing

The traditional approach to Natural Language Processing

The deep learning approach to Natural Language Processing

The roadmap – beyond this chapter

Introduction to the technical tools

Summary

Understanding TensorFlow

What is TensorFlow?

Inputs, variables, outputs, and operations

Reusing variables with scoping

Implementing our first neural network

Summary

Word2vec – Learning Word Embeddings

What is a word representation or meaning?

Classical approaches to learning word representation

Word2vec – a neural network-based approach to learning word representation

The skip-gram algorithm

The Continuous Bag-of-Words algorithm

Summary

Advanced Word2vec

The original skip-gram algorithm

Comparing skip-gram with CBOW

Extensions to the word embeddings algorithms

More recent algorithms extending skip-gram and CBOW

GloVe – Global Vectors representation

Document classification with Word2vec

Summary

Sentence Classification with Convolutional Neural Networks

Introducing Convolution Neural Networks

Understanding Convolution Neural Networks

Exercise – image classification on MNIST with CNN

Using CNNs for sentence classification

Summary

Recurrent Neural Networks

Understanding Recurrent Neural Networks

Backpropagation Through Time

Applications of RNNs

Generating text with RNNs

Evaluating text results output from the RNN

Perplexity – measuring the quality of the text result

Recurrent Neural Networks with Context Features – RNNs with longer memory

Summary

Long Short-Term Memory Networks

Understanding Long Short-Term Memory Networks

How LSTMs solve the vanishing gradient problem

Other variants of LSTMs

Summary

Applications of LSTM – Generating Text

Our data

Implementing an LSTM

Comparing LSTMs to LSTMs with peephole connections and GRUs

Improving LSTMs – beam search

Improving LSTMs – generating text with words instead of n-grams

Using the TensorFlow RNN API

Summary

Applications of LSTM – Image Caption Generation

Getting to know the data

The machine learning pipeline for image caption generation

Extracting image features with CNNs

Implementation – loading weights and inferencing with VGG-

Learning word embeddings

Preparing captions for feeding into LSTMs

Generating data for LSTMs

Defining the LSTM

Evaluating the results quantitatively

Captions generated for test images

Using TensorFlow RNN API with pretrained GloVe word vectors

Summary

Sequence-to-Sequence Learning – Neural Machine Translation

Machine translation

A brief historical tour of machine translation

Understanding Neural Machine Translation

Preparing data for the NMT system

Training the NMT

Inference with NMT

The BLEU score – evaluating the machine translation systems

Implementing an NMT from scratch – a German to English translator

Training an NMT jointly with word embeddings

Improving NMTs

Attention

Other applications of Seq2Seq models – chatbots

Summary

Current Trends and the Future of Natural Language Processing

Current trends in NLP

Penetration into other research fields

Towards Artificial General Intelligence

NLP for social media

New tasks emerging

Newer machine learning models

Summary

References

Mathematical Foundations and Advanced TensorFlow

Basic data structures

Special types of matrices

Tensor/matrix operations

Probability

Introduction to Keras

Introduction to the TensorFlow seq2seq library

Visualizing word embeddings with TensorBoard

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Introduction to the TensorFlow seq2seq library

We used the raw TensorFlow API for all our implementations in this book for better transparency of the actual functionality of the models and for a better learning experience. However, TensorFlow has various libraries that hide all the fine-grained details of the implementations. This allows users to implement sequence-to-sequence models like the Neural Machine Translation (NMT) model we saw in Chapter 10, Sequence-to-Sequence Learning – Neural Machine Translation with fewer lines of code and without worrying about more specific technical details about how they work. Knowledge about these libraries is important as they provide a much cleaner way of using these models in production code or researching beyond the existing methods. Therefore, we will go through a quick introduction of how to use the TensorFlow seq2seq library. This code is available as an exercise in the seq2seq_nmt.ipynb file.

Defining embeddings for the encoder and decoder

We will first define the encoder inputs, decoder inputs, and decoder output placeholders:

enc_train_inputs = []
dec_train_inputs, dec_train_labels = [],[]
for ui in range(source_sequence_length):
    enc_train_inputs.append(tf.placeholder(tf.int32, shape=[batch_size],name='train_inputs_%d'%ui))

for ui in range(target_sequence_length):
    dec_train_inputs.append(tf.placeholder(tf.int32, shape=[batch_size],name='train_inputs_%d'%ui))
    dec_train_labels.append(tf.placeholder(tf.int32, shape=[batch_size],name='train_outputs_%d'%ui))

Next, we will define the embedding lookup function for all the encoder and decoder inputs, to obtain the word embeddings:

encoder_emb_inp = [tf.nn.embedding_lookup(encoder_emb_layer, src) for src in enc_train_inputs]
encoder_emb_inp = tf.stack(encoder_emb_inp)

decoder_emb_inp = [tf.nn.embedding_lookup(decoder_emb_layer, src) for src in dec_train_inputs]
decoder_emb_inp = tf.stack(decoder_emb_inp)

Defining the encoder

The encoder is made with an LSTM cell as its basic building block. Then, we will define dynamic_rnn, which takes the defined LSTM cell as the input, and the state is initialized with zeros. Then, we will set the time_major parameter to True because our data has the time axis as the first axis (that is, axis 0). In other words, our data has the [sequence_length, batch_size, embeddings_size] shape, where time-dependent sequence_length is in the first axis. The benefit of dynamic_rnn is its ability to handle dynamically sized inputs. You can use the optional sequence_length argument to define the length of each sentence in the batch. For example, consider you have a batch of size [3,30] with three sentences having lengths of [10, 20, 30] (note that we pad the short sentences up to 30 with a special token). Passing a tensor that has values [10, 20, 30] as sequence_length will zero out LSTM outputs that are computed beyond the length of each sentence. For the cell state, it will not zero out, but take the last cell state computed within the length of the sentence and copy that value beyond the length of the sentence, until 30 is reached:

encoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)

initial_state = encoder_cell.zero_state(batch_size, dtype=tf.float32)

encoder_outputs, encoder_state = tf.nn.dynamic_rnn(
    encoder_cell, encoder_emb_inp, initial_state=initial_state,
    sequence_length=[source_sequence_length for _ in range(batch_size)], 
    time_major=True, swap_memory=True)

The swap_memory option allows TensorFlow to swap the tensors produced during the inference process between GPU and CPU, in case the model is too complex to fit entirely in the GPU.

Defining the decoder

The decoder is defined similar to the encoder, but has an extra layer called, projection_layer, which represents the softmax output layer for sampling the predictions made by the decoder. We will also define a TrainingHelper function that properly feeds the decoder inputs to the decoder. We also define two types of decoders in this example: a BasicDecoder and BahdanauAttention decoders. (The attention mechanism is discussed in Chapter 10, Sequence-to-Sequence Learning – Neural Machine Translation.) Many other decoders exist in the library, such as BeamSearchDecoder and BahdanauMonotonicAttention:

decoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)

projection_layer = Dense(units=vocab_size, use_bias=True)

helper = tf.contrib.seq2seq.TrainingHelper(
    decoder_emb_inp, [target_sequence_length for _ in range(batch_size)], time_major=True)

if decoder_type == 'basic':
    decoder = tf.contrib.seq2seq.BasicDecoder(
        decoder_cell, helper, encoder_state,
        output_layer=projection_layer)
    
elif decoder_type == 'attention':
    decoder = tf.contrib.seq2seq.BahdanauAttention(
        decoder_cell, helper, encoder_state,
        output_layer=projection_layer)

We will use dynamic decoding to get the outputs of the decoder:

outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(
    decoder, output_time_major=True,
    swap_memory=True
)

Next, we will define the logits, cross-entropy loss, and train prediction operations:

logits = outputs.rnn_output

crossent = tf.nn.sparse_softmax_cross_entropy_with_logits(
    labels=dec_train_labels, logits=logits)
loss = tf.reduce_mean(crossent)

train_prediction = outputs.sample_id

Then, we will define two optimizers, where we use AdamOptimizer for the first 10,000 steps and vanilla stochastic GradientDescentOptimizer for the rest of the optimization process. This is because, using Adam optimizer for a long term gives rise to some unexpected behaviors. Therefore, we will use Adam to obtain a good initial position for the SGD optimizer and then use SGD from then on:

with tf.variable_scope('Adam'):
    optimizer = tf.train.AdamOptimizer(learning_rate)
with tf.variable_scope('SGD'):
    sgd_optimizer = tf.train.GradientDescentOptimizer(learning_rate)

gradients, v = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 25.0)
optimize = optimizer.apply_gradients(zip(gradients, v))

sgd_gradients, v = zip(*sgd_optimizer.compute_gradients(loss))
sgd_gradients, _ = tf.clip_by_global_norm(sgd_gradients, 25.0)
sgd_optimize = optimizer.apply_gradients(zip(sgd_gradients, v))

Note

A rigorous evaluation on how optimizers perform in NMT training is found in a paper by Bahar and others, called, Empirical Investigation of Optimization Algorithms in Neural Machine Translation, The Prague Bulletin of Mathematical Linguistics, 2017.

Natural Language Processing with TensorFlow

By : Motaz Saad, Thushan Ganegedara

Natural Language Processing with TensorFlow

By: Motaz Saad, Thushan Ganegedara

Overview of this book

Related Content you might be interested in

Current Title:

Natural Language Processing with TensorFlow

Deep Learning Essentials

Hands-On Natural Language Processing with PyTorch 1.x

Recurrent Neural Networks with Python Quick Start Guide

Introduction to the TensorFlow seq2seq library

Defining embeddings for the encoder and decoder

Defining the encoder

Defining the decoder

Note