Book Image

Natural Language Processing with TensorFlow

By : Motaz Saad, Thushan Ganegedara
Book Image

Natural Language Processing with TensorFlow

By: Motaz Saad, Thushan Ganegedara

Overview of this book

Natural language processing (NLP) supplies the majority of data available to deep learning applications, while TensorFlow is the most important deep learning framework currently available. Natural Language Processing with TensorFlow brings TensorFlow and NLP together to give you invaluable tools to work with the immense volume of unstructured data in today’s data streams, and apply these tools to specific NLP tasks. Thushan Ganegedara starts by giving you a grounding in NLP and TensorFlow basics. You'll then learn how to use Word2vec, including advanced extensions, to create word embeddings that turn sequences of words into vectors accessible to deep learning algorithms. Chapters on classical deep learning algorithms, like convolutional neural networks (CNN) and recurrent neural networks (RNN), demonstrate important NLP tasks as sentence classification and language generation. You will learn how to apply high-performance RNN models, like long short-term memory (LSTM) cells, to NLP tasks. You will also explore neural machine translation and implement a neural machine translator. After reading this book, you will gain an understanding of NLP and you'll have the skills to apply TensorFlow in deep learning NLP applications, and how to perform specific NLP tasks.
Table of Contents (16 chapters)
Natural Language Processing with TensorFlow
Contributors
Preface
Index

Introduction to the TensorFlow seq2seq library


We used the raw TensorFlow API for all our implementations in this book for better transparency of the actual functionality of the models and for a better learning experience. However, TensorFlow has various libraries that hide all the fine-grained details of the implementations. This allows users to implement sequence-to-sequence models like the Neural Machine Translation (NMT) model we saw in Chapter 10, Sequence-to-Sequence Learning – Neural Machine Translation with fewer lines of code and without worrying about more specific technical details about how they work. Knowledge about these libraries is important as they provide a much cleaner way of using these models in production code or researching beyond the existing methods. Therefore, we will go through a quick introduction of how to use the TensorFlow seq2seq library. This code is available as an exercise in the seq2seq_nmt.ipynb file.

Defining embeddings for the encoder and decoder

We will first define the encoder inputs, decoder inputs, and decoder output placeholders:

enc_train_inputs = []
dec_train_inputs, dec_train_labels = [],[]
for ui in range(source_sequence_length):
    enc_train_inputs.append(tf.placeholder(tf.int32, shape=[batch_size],name='train_inputs_%d'%ui))

for ui in range(target_sequence_length):
    dec_train_inputs.append(tf.placeholder(tf.int32, shape=[batch_size],name='train_inputs_%d'%ui))
    dec_train_labels.append(tf.placeholder(tf.int32, shape=[batch_size],name='train_outputs_%d'%ui))

Next, we will define the embedding lookup function for all the encoder and decoder inputs, to obtain the word embeddings:

encoder_emb_inp = [tf.nn.embedding_lookup(encoder_emb_layer, src) for src in enc_train_inputs]
encoder_emb_inp = tf.stack(encoder_emb_inp)

decoder_emb_inp = [tf.nn.embedding_lookup(decoder_emb_layer, src) for src in dec_train_inputs]
decoder_emb_inp = tf.stack(decoder_emb_inp)

Defining the encoder

The encoder is made with an LSTM cell as its basic building block. Then, we will define dynamic_rnn, which takes the defined LSTM cell as the input, and the state is initialized with zeros. Then, we will set the time_major parameter to True because our data has the time axis as the first axis (that is, axis 0). In other words, our data has the [sequence_length, batch_size, embeddings_size] shape, where time-dependent sequence_length is in the first axis. The benefit of dynamic_rnn is its ability to handle dynamically sized inputs. You can use the optional sequence_length argument to define the length of each sentence in the batch. For example, consider you have a batch of size [3,30] with three sentences having lengths of [10, 20, 30] (note that we pad the short sentences up to 30 with a special token). Passing a tensor that has values [10, 20, 30] as sequence_length will zero out LSTM outputs that are computed beyond the length of each sentence. For the cell state, it will not zero out, but take the last cell state computed within the length of the sentence and copy that value beyond the length of the sentence, until 30 is reached:

encoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)

initial_state = encoder_cell.zero_state(batch_size, dtype=tf.float32)

encoder_outputs, encoder_state = tf.nn.dynamic_rnn(
    encoder_cell, encoder_emb_inp, initial_state=initial_state,
    sequence_length=[source_sequence_length for _ in range(batch_size)], 
    time_major=True, swap_memory=True)

The swap_memory option allows TensorFlow to swap the tensors produced during the inference process between GPU and CPU, in case the model is too complex to fit entirely in the GPU.

Defining the decoder

The decoder is defined similar to the encoder, but has an extra layer called, projection_layer, which represents the softmax output layer for sampling the predictions made by the decoder. We will also define a TrainingHelper function that properly feeds the decoder inputs to the decoder. We also define two types of decoders in this example: a BasicDecoder and BahdanauAttention decoders. (The attention mechanism is discussed in Chapter 10, Sequence-to-Sequence Learning – Neural Machine Translation.) Many other decoders exist in the library, such as BeamSearchDecoder and BahdanauMonotonicAttention:

decoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)

projection_layer = Dense(units=vocab_size, use_bias=True)

helper = tf.contrib.seq2seq.TrainingHelper(
    decoder_emb_inp, [target_sequence_length for _ in range(batch_size)], time_major=True)

if decoder_type == 'basic':
    decoder = tf.contrib.seq2seq.BasicDecoder(
        decoder_cell, helper, encoder_state,
        output_layer=projection_layer)
    
elif decoder_type == 'attention':
    decoder = tf.contrib.seq2seq.BahdanauAttention(
        decoder_cell, helper, encoder_state,
        output_layer=projection_layer)

We will use dynamic decoding to get the outputs of the decoder:

outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(
    decoder, output_time_major=True,
    swap_memory=True
)

Next, we will define the logits, cross-entropy loss, and train prediction operations:

logits = outputs.rnn_output

crossent = tf.nn.sparse_softmax_cross_entropy_with_logits(
    labels=dec_train_labels, logits=logits)
loss = tf.reduce_mean(crossent)

train_prediction = outputs.sample_id

Then, we will define two optimizers, where we use AdamOptimizer for the first 10,000 steps and vanilla stochastic GradientDescentOptimizer for the rest of the optimization process. This is because, using Adam optimizer for a long term gives rise to some unexpected behaviors. Therefore, we will use Adam to obtain a good initial position for the SGD optimizer and then use SGD from then on:

with tf.variable_scope('Adam'):
    optimizer = tf.train.AdamOptimizer(learning_rate)
with tf.variable_scope('SGD'):
    sgd_optimizer = tf.train.GradientDescentOptimizer(learning_rate)

gradients, v = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 25.0)
optimize = optimizer.apply_gradients(zip(gradients, v))

sgd_gradients, v = zip(*sgd_optimizer.compute_gradients(loss))
sgd_gradients, _ = tf.clip_by_global_norm(sgd_gradients, 25.0)
sgd_optimize = optimizer.apply_gradients(zip(sgd_gradients, v))

Note

A rigorous evaluation on how optimizers perform in NMT training is found in a paper by Bahar and others, called, Empirical Investigation of Optimization Algorithms in Neural Machine Translation, The Prague Bulletin of Mathematical Linguistics, 2017.