Neural Networks with Keras Cookbook

By : V Kishore Ayyadevara

Neural Networks with Keras Cookbook

By: V Kishore Ayyadevara

Overview of this book

This book will take you from the basics of neural networks to advanced implementations of architectures using a recipe-based approach. We will learn about how neural networks work and the impact of various hyper parameters on a network's accuracy along with leveraging neural networks for structured and unstructured data. Later, we will learn how to classify and detect objects in images. We will also learn to use transfer learning for multiple applications, including a self-driving car using Convolutional Neural Networks. We will generate images while leveraging GANs and also by performing image encoding. Additionally, we will perform text analysis using word vector based techniques. Later, we will use Recurrent Neural Networks and LSTM to implement chatbot and Machine Translation systems. Finally, you will learn about transcribing images, audio, and generating captions and also use Deep Q-learning to build an agent that plays Space Invaders game. By the end of this book, you will have developed the skills to choose and customize multiple neural network architectures for various deep learning problems you might encounter.

Preface

Who this book is for

What this book covers

To get the most out of this book

Sections

Get in touch

Free Chapter

Building a Feedforward Neural Network

Introduction

Architecture of a simple neural network

Applications of a neural network

Feed-forward propagation from scratch in Python

Building back-propagation from scratch in Python

Building a neural network in Keras

Building a Deep Feedforward Neural Network

Training a vanilla neural network

Scaling the input dataset

Impact on training when the majority of inputs are greater than zero

Impact of batch size on model accuracy

Building a deep neural network to improve network accuracy

Varying the learning rate to improve network accuracy

Varying the loss optimizer to improve network accuracy

Understanding the scenario of overfitting

Speeding up the training process using batch normalization

Applications of Deep Feedforward Neural Networks

Introduction

Predicting credit default

Assigning weights for classes

Predicting house prices

Categorizing news articles into topics

Classifying common audio

Stock price prediction

Leveraging a functional API

Defining weights for rows

Building a Deep Convolutional Neural Network

Introduction

Inaccuracy of traditional neural networks when images are translated

Building a CNN from scratch using Python

CNNs to improve accuracy in the case of image translation

Gender classification using CNNs

Data augmentation to improve network accuracy

Transfer Learning

Gender classification of the person in an image using CNNs

Gender classification of the person in image using the VGG16 architecture-based model

Visualizing the output of the intermediate layers of a neural network

Gender classification of the person in image using the VGG19 architecture-based model

Gender classification using the Inception v3 architecture-based model

Gender classification of the person in image using the ResNet 50 architecture-based model

Detecting the key points within image of a face

Detecting and Localizing Objects in Images

Introduction

Creating the dataset for a bounding box

Generating region proposals within an image, using selective search

Calculating an intersection over a union between two images

Detecting objects, using region proposal-based CNN

Performing non-max suppression

Detecting a person using an anchor box-based algorithm

Image Analysis Applications in Self-Driving Cars

Traffic sign identification

Predicting the angle within which a car needs to be turned

Instance segmentation using the U-net architecture

Semantic segmentation of objects in an image

Image Generation

Introduction

Generating images that can fool a neural network using adversarial attack

DeepDream algorithm to generate images

Neural style transfer between images

Generating images of digits using Generative Adversarial Networks

Generating images using a Deep Convolutional GAN

Face generation using a Deep Convolutional GAN

Face transition from one to another

Performing vector arithmetic on generated images

Encoding Inputs

Introduction

Need for encoding

Encoding an image

Encoding for recommender systems

Text Analysis Using Word Vectors

Introduction

Building a word vector from scratch in Python

Building a word vector using the skip-gram and CBOW models

Performing vector arithmetic using pre-trained word vectors

Creating a document vector

Building word vectors using fastText

Building word vectors using GloVe

Building sentiment classification using word vectors

Building a Recurrent Neural Network

Introduction

Building an RNN from scratch in Python

Implementing RNN for sentiment classification

Building a LSTM Network from scratch in Python

Implementing LSTM for sentiment classification

Implementing stacked LSTM for sentiment classification

Applications of a Many-to-One Architecture RNN

Generating text

Movie recommendations

Topic-modeling, using embeddings

Forecasting the value of a stock's price

Sequence-to-Sequence Learning

Introduction

Returning sequences of outputs from a network

Building a chatbot

Machine translation

Encoder decoder architecture for machine translation

Encoder decoder architecture with attention for machine translation

End-to-End Learning

Introduction

Connectionist temporal classification (CTC)

Handwritten-text recognition

Image caption generation

Generating captions, using beam search

Audio Analysis

Classifying a song by genre

Generating music using deep learning

Transcribing audio into text

Reinforcement Learning

The optimal action to take in a simulated game with a non-negative reward

The optimal action to take in a state in a simulated game

Q-learning to maximize rewards when playing Frozen Lake

Deep Q-learning to balance a cart pole

Deep Q-learning to play Space Invaders game

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Building back-propagation from scratch in Python

In forward-propagation, we connected the input layer to the hidden layer to the output layer. In back-propagation, we take the reverse approach.

Getting ready

We change each weight within the neural network by a small amount – one at a time. A change in the weight value will have an impact on the final loss value (either increasing or decreasing loss). We'll update the weight in the direction of decreasing loss.

Additionally, in some scenarios, for a small change in weight, the error increases/decreases considerably, while in some cases the error decreases by a small amount.

By updating the weights by a small amount and measuring the change in error that the update in weights leads to, we are able to do the following:

Determine the direction of the weight update
Determine the magnitude of the weight update

Before implementing back-propagation, let's understand one additional detail of neural networks: the learning rate.

Intuitively, the learning rate helps us to build trust in the algorithm. For example, when deciding on the magnitude of the weight update, we would potentially not change it by a huge amount in one go, but take a more careful approach in updating the weights more slowly.

This results in obtaining stability in our model; we will look at how the learning rate helps with stability in the next chapter.

The whole process by which we update weights to reduce error is called a gradient-descent technique.

Stochastic gradient descent is the means by which error is minimized in the preceding scenario. More intuitively, gradient stands for difference (which is the difference between actual and predicted) and descent means reduce. Stochastic stands for the selection of number of random samples based on which a decision is taken.

Apart from stochastic gradient descent, there are many other optimization techniques that help to optimize for the loss values; the different optimization techniques will be discussed in the next chapter.

Back-propagation works as follows:

Calculates the overall cost function from the feedforward process.
Varies all the weights (one at a time) by a small amount.
Calculates the impact of the variation of weight on the cost function.
Depending on whether the change has an increased or decreased the cost (loss) value, it updates the weight value in the direction of loss decrease. And then repeats this step across all the weights we have.

If the preceding steps are performed n number of times, it essentially results in n epochs.

In order to further cement our understanding of back-propagation in neural networks, let's start with a known function and see how the weights could be derived:

For now, we will have the known function as y = 2x, where we try to come up with the weight value and bias value, which are 2 and 0 in this specific case:

x	y
1	2
2	4
3	6
4	8

If we formulate the preceding dataset as a linear regression, (y = a*x+b), where we are trying to calculate the values of a and b (which we already know are 2 and 0, but are checking how those values are obtained using gradient descent), let's randomly initialize the a and b parameters to values of 1.477 and 0 (the ideal values of which are 2 and 0).

How to do it...

In this section, we will build the back-propagation algorithm by hand so that we clearly understand how weights are calculated in a neural network. In this specific case, we will build a simple neural network where there is no hidden layer (thus we are solving a regression equation). The code file is available as Neural_network_working_details.ipynb in GitHub.

Initialize the dataset as follows:

x = [[1],[2],[3],[4]]
y = [[2],[4],[6],[8]]

Initialize the weight and bias values randomly (we have only one weight and one bias value as we are trying to identify the optimal values of a and b in the y = a*x + b equation):

w = [[[1.477867]], [0.]]

Define the feed-forward network and calculate the squared error loss value:

import numpy as np
def feed_forward(inputs, outputs, weights):
     out = np.dot(inputs,weights[0]) + weights[1]
     squared_error = (np.square(out - outputs))
     return squared_error

In the preceding code, we performed a matrix multiplication of the input with the randomly-initialized weight value and summed it up with the randomly-initialized bias value.

Once the value is calculated, we calculate the squared error value of the difference between the actual and predicted values.

Increase each weight and bias value by a very small amount (0.0001) and calculate the squared error loss value one at a time for each of the weight and bias updates.

If the squared error loss value decreases as the weight increases, the weight value should be increased. The magnitude by which the weight value should be increased is proportional to the amount of loss value the weight change decreases by.

Additionally, ensure that you do not increase the weight value as much as the loss decrease caused by the weight change, but weigh it down with a factor called the learning rate. This ensures that the loss decreases more smoothly (there's more on how the learning rate impacts the model accuracy in the next chapter).

In the following code, we are creating a function named update_weights, which performs the back-propagation process to update weights that were obtained in step 3. We are also mentioning that the function needs to be run for epochs number of times (where epochs is a parameter we are passing to update_weights function):

def update_weights(inputs, outputs, weights, epochs): 
     for epoch in range(epochs):

Pass the input through a feed-forward network to calculate the loss with the initial set of weights:

        org_loss = feed_forward(inputs, outputs, weights)

Ensure that you deepcopy the list of weights, as the weights will be manipulated in further steps, and hence deepcopy takes care of any issues resulting from the change in the child variable impacting the parent variable that it is pointing to:

        wts_tmp = deepcopy(weights)
        wts_tmp2 = deepcopy(weights)

Loop through all the weight values, one at a time, and change them by a small value (0.0001):

        for i in range(len(weights)):
             wts_tmp[-(i+1)] += 0.0001

Calculate the updated feed-forward loss when the weight is updated by a small amount. Calculate the change in loss due to the small change in input. Divide the change in loss by the number of input, as we want to calculate the mean squared error across all the input samples we have:

            loss = feed_forward(inputs, outputs, wts_tmp)
            delta_loss = np.sum(org_loss - loss)/(0.0001*len(inputs))

Updating the weight by a small value and then calculating its impact on loss value is equivalent to performing a derivative with respect to change in weight.

Update the weights by the change in loss that they are causing. Update the weights slowly by multiplying the change in loss by a very small number (0.01), which is the learning rate parameter (more about the learning rate parameter in the next chapter):

            wts_tmp2[-(i+1)] += delta_loss*0.01 
            wts_tmp = deepcopy(weights)

The updated weights and bias value are returned:

    weights = deepcopy(wts_tmp2)
 return wts_tmp2

One of the other parameters in a neural network is the batch size considered in calculating the loss values.

In the preceding scenario, we considered all the data points in order to calculate the loss value. However, in practice, when we have thousands (or in some cases, millions) of data points, the incremental contribution of a greater number of data points while calculating loss value would follow the law of diminishing returns and hence we would be using a batch size that is much smaller compared to the total number of data points we have.

The typical batch size considered in building a model is anywhere between 32 and 1,024.

There's more...

In the previous section, we built a regression formula (Y = a*x + b) where we wrote a function to identify the optimal values of a and b. In this section, we will build a simple neural network with a hidden layer that connects the input to the output on the same toy dataset that we worked on in the previous section.

We define the model as follows (the code file is available as Neural_networks_multiple_layers.ipynb in GitHub):

The input is connected to a hidden layer that has three units
The hidden layer is connected to the output, which has one unit in output layer

Let us go ahead and code up the strategy discussed above, as follows:

Define the dataset and import the relevant packages:

from copy import deepcopy
import numpy as np

x = [[1],[2],[3],[4]]
y = [[2],[4],[6],[8]]

We use deepcopy so that the value of the original variable does not change when the variable to which the original variable's values are copied has its values changed.

Initialize the weight and bias values randomly. The hidden layer has three units in it. Hence, there are a total of three weight values and three bias values – one corresponding to each of the hidden units.

Additionally, the final layer has one unit that is connected to the three units of the hidden layer. Hence, a total of three weights and one bias dictate the value of the output layer.

The randomly-initialized weights are as follows:

w = [[[-0.82203424, -0.9185806 , 0.03494298]], [0., 0., 0.], [[ 1.0692896 ],[ 0.62761235],[-0.5426246 ]], [0]]

Implement the feed-forward network where the hidden layer has a ReLU activation in it:

def feed_forward(inputs, outputs, weights):
     pre_hidden = np.dot(inputs,weights[0])+ weights[1]
     hidden = np.where(pre_hidden<0, 0, pre_hidden) 
     out = np.dot(hidden, weights[2]) + weights[3]
     squared_error = (np.square(out - outputs))
     return squared_error

Define the back-propagation function similarly to what we did in the previous section. The only difference is that we now have to update the weights in more layers.

In the following code, we are calculating the original loss at the start of an epoch:

def update_weights(inputs, outputs, weights, epochs): 
     for epoch in range(epochs):
         org_loss = feed_forward(inputs, outputs, weights)

In the following code, we are copying weights into two sets of weight variables so that they can be reused in a later code:

        wts_new = deepcopy(weights)
        wts_new2 = deepcopy(weights)

In the following code, we are updating each weight value by a small amount and then calculating the loss value corresponding to the updated weight value (while every other weight is kept unchanged). Additionally, we are ensuring that the weight update happens across all weights and also across all layers in a network.

The change in the squared loss (del_loss) is attributed to the change in the weight value. We repeat the preceding step for all the weights that exist in the network:

         for i, layer in enumerate(reversed(weights)):
            for index, weight in np.ndenumerate(layer):
                wts_tmp[-(i+1)][index] += 0.0001
                loss = feed_forward(inputs, outputs, wts_tmp)
                del_loss = np.sum(org_loss - loss)/(0.0001*len(inputs))

The weight value is updated by weighing down by the learning rate parameter – a greater decrease in loss will update weights by a lot, while a lower decrease in loss will update the weight by a small amount:

               wts_tmp2[-(i+1)][index] += del_loss*0.01
               wts_tmp = deepcopy(weights)

Given that the weight values are updated one at a time in order to estimate their impact on the loss value, there is a potential to parallelize the process of weight updates. Hence, GPUs come in handy in such scenarios as they have more cores than a CPU and thus more weights can be updated using a GPU in a given amount of time compared to a CPU.

Finally, we return the updated weights:

                    
          weights = deepcopy(wts_tmp2)
 return wts_tmp2

Run the function an epoch number of times to update the weights an epoch number of times:

update_weights(x,y,w,1)

The output (updated weights) of preceding code is as follows:

In the preceding steps, we learned how to build a neural network from scratch in Python. In the next section, we will learn about building a neural network in Keras.

Neural Networks with Keras Cookbook

By : V Kishore Ayyadevara

Neural Networks with Keras Cookbook

By: V Kishore Ayyadevara

Overview of this book

Related Content you might be interested in

Current Title:

Neural Networks with Keras Cookbook

Modern Computer Vision with PyTorch

Deep Learning with Keras

Python Deep Learning Cookbook