Book Image

Deep Learning for Natural Language Processing

By : Karthiek Reddy Bokka, Shubhangi Hora, Tanuj Jain, Monicah Wambugu
Book Image

Deep Learning for Natural Language Processing

By: Karthiek Reddy Bokka, Shubhangi Hora, Tanuj Jain, Monicah Wambugu

Overview of this book

Applying deep learning approaches to various NLP tasks can take your computational algorithms to a completely new level in terms of speed and accuracy. Deep Learning for Natural Language Processing starts by highlighting the basic building blocks of the natural language processing domain. The book goes on to introduce the problems that you can solve using state-of-the-art neural network models. After this, delving into the various neural network architectures and their specific areas of application will help you to understand how to select the best model to suit your needs. As you advance through this deep learning book, you’ll study convolutional, recurrent, and recursive neural networks, in addition to covering long short-term memory networks (LSTM). Understanding these networks will help you to implement their models using Keras. In later chapters, you will be able to develop a trigger word detection application using NLP techniques such as attention model and beam search. By the end of this book, you will not only have sound knowledge of natural language processing, but also be able to select the best text preprocessing and neural network models to solve a number of NLP issues.
Table of Contents (11 chapters)

Other Architectures and Developments

The attention mechanism architecture described in the last section is only a way of building attention mechanism. In recent times, several other architectures have been proposed, which constitute a state of the art in the deep learning NLP world. In this section, we will briefly mention some of these architectures.

Transformer

In late 2017, Google came up with an attention mechanism architecture in their seminal paper titled "Attention is all you need." This architecture is considered state-of-the-art in the NLP community. The transformer architecture makes use of a special multi-head attention mechanism to generate attention at various levels. Additionally, it is also employs residual connections to further ensure that the vanishing gradient problem has a minimal impact on learning. The special architecture of transformers also allows a massive speed up of the training phase while providing better quality results.

The most commonly used package...