Book Image

Natural Language Processing with TensorFlow - Second Edition

By : Thushan Ganegedara
2 (1)
Book Image

Natural Language Processing with TensorFlow - Second Edition

2 (1)
By: Thushan Ganegedara

Overview of this book

Learning how to solve natural language processing (NLP) problems is an important skill to master due to the explosive growth of data combined with the demand for machine learning solutions in production. Natural Language Processing with TensorFlow, Second Edition, will teach you how to solve common real-world NLP problems with a variety of deep learning model architectures. The book starts by getting readers familiar with NLP and the basics of TensorFlow. Then, it gradually teaches you different facets of TensorFlow 2.x. In the following chapters, you then learn how to generate powerful word vectors, classify text, generate new text, and generate image captions, among other exciting use-cases of real-world NLP. TensorFlow has evolved to be an ecosystem that supports a machine learning workflow through ingesting and transforming data, building models, monitoring, and productionization. We will then read text directly from files and perform the required transformations through a TensorFlow data pipeline. We will also see how to use a versatile visualization tool known as TensorBoard to visualize our models. By the end of this NLP book, you will be comfortable with using TensorFlow to build deep learning models with many different architectures, and efficiently ingest data using TensorFlow Additionally, you’ll be able to confidently use TensorFlow throughout your machine learning workflow.
Table of Contents (15 chapters)
12
Other Books You May Enjoy
13
Index

What this book covers

Chapter 1, Introduction to Natural Language Processing, explains what natural language processing is and the kinds of tasks it may entail. We then discuss how an NLP task is solved using traditional methods. This paves the way to discuss how deep learning is used in NLP and what the benefits are. Finally, we discuss the installation and usage of the technical tools in this book.

Chapter 2, Understanding TensorFlow 2, provides you with a sound guide to writing programs and running them in TensorFlow 2. This chapter will first offer an in-depth explanation of how TensorFlow executes a program. This will help you to understand the TensorFlow execution workflow and feel comfortable with TensorFlow terminology. Next, we will discuss various building blocks in TensorFlow and useful operations that are available. We will finally discuss how all this knowledge of TensorFlow can be used to implement a simple neural network to classify images of handwritten digits.

Chapter 3, Word2vec – Learning Word Embeddings, introduces Word2vec—a method to learn numerical representations of words that reflect the semantics of the words. But before diving straight into Word2vec techniques, we first discuss some classical approaches used to represent words, such as one-hot-encoded representations, and the Term Frequency-Inverse Document Frequency (TF-IDF) frequency method. Following this, we will move on to a modern tool forlearning word vectors known as Word2vec, which uses a neural network to learn word representations. We will discuss two popular Word2vec variants: skip-gram and the Continuous Bag-of-Words (CBOW) model. Finally, we will visualize the word representations learned using a dimensionality reduction technique to map the vectors to a more interpretable two-dimensional surface.

Chapter 4, Advanced Word Vector Algorithms, starts with a more recent word embedding learning technique known as GloVe, which incorporates both global and local statisticsin text data to find word vectors. Next, we will learn about one of the modern, more sophisticated techniques for generating dynamic word representations based on the context of a word, known as ELMo.

Chapter 5, Sentence Classification with Convolutional Neural Networks, introduces you to Convolutional Neural Networks (CNNs). CNNs are a powerful family of deep models that can leverage the spatial structure of an input to learn from data. In other words, a CNN can process images in their two-dimensional form, whereas a multilayer perceptron needs the image to be unwrapped to a one-dimensional vector. We will first discuss various operations that are undergone in CNNs, such as the convolution and pooling operations, in detail. Then, we will see an example where we will learn to classify images of clothes with a CNN. Then, we will transition into an application of CNNs in NLP. More precisely, we will be investigating how to apply a CNN to classify sentences, where the task is to classify if a sentence is about a person, location, object, and so on.

Chapter 6, Recurrent Neural Networks, focuses on introducing Recurrent Neural Networks (RNNs) and using RNNs for language generation. RNNs are different from feed-forward neural networks (for example, CNNs) as RNNs have memory. The memory is stored as a continuously updated system state. We will start with a representation of a feed-forward neural network and modify that representation to learn from sequences of data instead of individual data points. This process will transform the feed-forward network to an RNN. This will be followed by a technical description of the exact equations used for computations within the RNN. Next, we will discuss the optimization process of RNNs that is used to update the RNN’s weights. Thereafter we will iterate through different types of RNNs such as one-to-one RNNs and one-to-many RNNs. We will then discuss a popular application of RNNs, which is to identify named entities in text (for example, Person name, Organization, and so on). Here, we’ll be using a basic RNN model to learn.Next, we will enhance our model further by incorporating embeddings at different scales (for example, token embeddings and character embeddings). The token embeddings are generated through an embedding layer, where the character embeddings are generated using a CNN. We will then analyze the new model’s performance on the named entity recognition task.

Chapter 7, Understanding Long Short-Term Memory Networks, discusses Long Short-Term Memory networks ( LSTMs) by initially providing an intuitive explanation of how these models work and progressively diving into the technical details required to implement them on your own. Standard RNNs suffer from the crucial limitation of the inability to persist long-term memory. However, advanced RNN models (for example, LSTMs and Gated Recurrent Units (GRUs)) have been proposed, which can remember sequences for a large number of time steps. We will also examine how exactly LSTMs alleviate the problem of persisting long-term memory (this is known as the vanishing gradient problem). We will then discuss several modifications that can be used to improve LSTM models further, such as predicting several time steps ahead at once and reading sequences both forward and backward. Finally, we will discuss several variants of LSTM models such as GRUs and LSTMs with peephole connections.

Chapter 8, Applications of LSTM – Generating Text, explains how to implement the LSTMs, GRUs, and LSTMs with peephole connections discussed in Chapter 7, Understanding Long Short-Term Memory Networks. Furthermore, we will compare the performance of these extensions both qualitatively and quantitatively. We will also discuss how to implement some of the extensions examined in Chapter 7, Understanding Long Short-Term Memory Networks, such as predicting several time steps ahead (known as beam search) and using word vectors as inputs instead of one-hot-encoded representations.

Chapter 9, Sequence-to-Sequence Learning – Neural Machine Translation, discusses machine translation, which has gained a lot of attention both due to the necessity of automating translation and the inherent difficulty of the task. We start the chapter with a brief historical flashback explaining how machine translation was implemented in the early days. This discussion ends with an introduction to Neural Machine Translation (NMT) systems. We will see how well current NMT systems are doing compared to old systems (such as statistical machine translation systems), which will motivate us to learn about NMT systems. Afterward, we will discuss the concepts underpinning the design of NMT systems and continue with the technical details. Then, we will discuss the evaluation metric we use to evaluate our system. Following this, we will investigate how we can implement an English-to-German translator from scratch. Next, we will learn about ways to improve NMT systems. We will look at one of those extensions in detail, called the attention mechanism. The attention mechanism has become essential in sequence-to-sequence learning problems.Finally, we will compare the performance improvement obtained with the attention mechanism and analyze the reasons behind the performance gain. This chapter concludes with a section on how the same concept of NMT systems can be extended to implement chatbots. Chatbots are systems that can communicate with humans and are used to fulfill various customer requests.

Chapter 10, Transformers, discusses Transformers, the latest breakthrough in the domain of NLP which have outperformed many other previous state-of-the-art models. In this chapter, we will use the Hugging Face Transformers library to use pre-trained models for downstream tasks with ease. In this chapter, we will learn about the Transformer architecture in depth. This discussion will lead into a popular Transformer model called BERT, which we will use to solve a problem of question answering. We will discuss specific components found in BERT to effectively use it for the application. Next, we will train the model on a popular question-answer dataset known as SQUAD. Finally, we will evaluate the model on a test dataset and use the trained model to generate answers for unseen questions.

Chapter 11, Image Captioning with Transformers, looks at another exciting application, where Transformers are used to generate captions (that is, descriptions) for images. This application is interesting because it shows us how to combine two different types of models as well as how to learn with multimodal data (for example, images and text). Here, we will use a pre-trained Vision Transformer model that generates a rich hidden representation for a given image. This representation, along with caption tokens, is fed to a text-based Transformer model. The text-based Transformer predicts the next caption token, given previous caption tokens. Once the model is trained, we will evaluate the captions generated by our model, both qualitatively and quantitatively. We will also discuss some of the popular metrics used to measure the quality of sequences such as image captions.

Appendix A: Mathematical Foundations and Advanced TensorFlow, introduces various mathematical data structures (for example, matrices) and operations (for example, a matrix inverse). We will also discuss several important concepts in probability. Finally, we will walk you through a guide aimed at teaching you to use TensorBoard to visualize word embeddings. TensorBoard is a handy visualization tool that is shipped with TensorFlow. This can be used to visualize and monitor various variables in your TensorFlow client.