Book Image

Recurrent Neural Networks with Python Quick Start Guide

By : Simeon Kostadinov
Book Image

Recurrent Neural Networks with Python Quick Start Guide

By: Simeon Kostadinov

Overview of this book

Developers struggle to find an easy-to-follow learning resource for implementing Recurrent Neural Network (RNN) models. RNNs are the state-of-the-art model in deep learning for dealing with sequential data. From language translation to generating captions for an image, RNNs are used to continuously improve results. This book will teach you the fundamentals of RNNs, with example applications in Python and the TensorFlow library. The examples are accompanied by the right combination of theoretical knowledge and real-world implementations of concepts to build a solid foundation of neural network modeling. Your journey starts with the simplest RNN model, where you can grasp the fundamentals. The book then builds on this by proposing more advanced and complex algorithms. We use them to explain how a typical state-of-the-art RNN model works. From generating text to building a language translator, we show how some of today's most powerful AI applications work under the hood. After reading the book, you will be confident with the fundamentals of RNNs, and be ready to pursue further study, along with developing skills in this exciting field.
Table of Contents (8 chapters)

What is an RNN?

An RNN is one powerful model from the deep learning family that has shown incredible results in the last five years. It aims to make predictions on sequential data by utilizing a powerful memory-based architecture.

But how is it different from a standard neural network? A normal (also called feedforward) neural network acts like a mapping function, where a single input is associated with a single output. In this architecture, no two inputs share knowledge and the each moves in only one direction—starting from the input nodes, passing through hidden nodes, and finishing at the output nodes. Here is an illustration of the aforementioned model:

On the contrary, a recurrent (also called feedback) neural network uses an additional memory state. When an input A1 (word I) is added, the network produces an output B1 (word love) and stores information about the input A1 in the memory state. When the next input A2 (word love) is added, the network produces the associated output B2 (word to) with the help of the memory state. Then, the memory state is updated using information from the new input A2. This operation is repeated for each input:

You can see how with this method our predictions depend not only on the current input, but also on previous data. This is the reason why RNNs are the state-of-the-art model for dealing with sequences. Let's illustrate this with some examples.

A typical use case for the feedforward architecture is image recognition. We can see its application in agriculture for analyzing plants, in healthcare for diagnosing diseases, and in driverless cars for detecting pedestrians. Since no output in any of these examples requires specific information from a previous input, the feedforward network is a great fit for such problems.

There is also another set of problems, which are based on sequential data. In these cases, predicting the next element in the sequence depends on all the previous elements. The following is a list of several examples:

  • Translating text to speech
  • Predicting the next word in a sentence
  • Converting audio to text
  • Language translation
  • Captioning videos

RNNs were first introduced in the 1980s with the invention of the Hopfield network. Later, in 1997, Hochreiter and Schmidhuber proposed an advanced RNN model called long short-term memory (LSTM). It aims to solve some major issues with the simplest recurrent neural network model, which we will reveal later in the chapter. A more recent improvement to the RNN family was presented in 2014 by Chung et al. This new architecture, called Gated Recurrent Unit, solves the same problem as LSTM but in a simpler manner.

In the next chapters of this book, we will go over the aforementioned models and see how they work and why researchers and large companies are using them on a daily basis to solve fundamental problems.