Neural Networks with Keras Cookbook

By : V Kishore Ayyadevara

Neural Networks with Keras Cookbook

By: V Kishore Ayyadevara

Overview of this book

This book will take you from the basics of neural networks to advanced implementations of architectures using a recipe-based approach. We will learn about how neural networks work and the impact of various hyper parameters on a network's accuracy along with leveraging neural networks for structured and unstructured data. Later, we will learn how to classify and detect objects in images. We will also learn to use transfer learning for multiple applications, including a self-driving car using Convolutional Neural Networks. We will generate images while leveraging GANs and also by performing image encoding. Additionally, we will perform text analysis using word vector based techniques. Later, we will use Recurrent Neural Networks and LSTM to implement chatbot and Machine Translation systems. Finally, you will learn about transcribing images, audio, and generating captions and also use Deep Q-learning to build an agent that plays Space Invaders game. By the end of this book, you will have developed the skills to choose and customize multiple neural network architectures for various deep learning problems you might encounter.

Preface

Who this book is for

What this book covers

To get the most out of this book

Sections

Get in touch

Free Chapter

Building a Feedforward Neural Network

Introduction

Architecture of a simple neural network

Applications of a neural network

Feed-forward propagation from scratch in Python

Building back-propagation from scratch in Python

Building a neural network in Keras

Building a Deep Feedforward Neural Network

Training a vanilla neural network

Scaling the input dataset

Impact on training when the majority of inputs are greater than zero

Impact of batch size on model accuracy

Building a deep neural network to improve network accuracy

Varying the learning rate to improve network accuracy

Varying the loss optimizer to improve network accuracy

Understanding the scenario of overfitting

Speeding up the training process using batch normalization

Applications of Deep Feedforward Neural Networks

Introduction

Predicting credit default

Assigning weights for classes

Predicting house prices

Categorizing news articles into topics

Classifying common audio

Stock price prediction

Leveraging a functional API

Defining weights for rows

Building a Deep Convolutional Neural Network

Introduction

Inaccuracy of traditional neural networks when images are translated

Building a CNN from scratch using Python

CNNs to improve accuracy in the case of image translation

Gender classification using CNNs

Data augmentation to improve network accuracy

Transfer Learning

Gender classification of the person in an image using CNNs

Gender classification of the person in image using the VGG16 architecture-based model

Visualizing the output of the intermediate layers of a neural network

Gender classification of the person in image using the VGG19 architecture-based model

Gender classification using the Inception v3 architecture-based model

Gender classification of the person in image using the ResNet 50 architecture-based model

Detecting the key points within image of a face

Detecting and Localizing Objects in Images

Introduction

Creating the dataset for a bounding box

Generating region proposals within an image, using selective search

Calculating an intersection over a union between two images

Detecting objects, using region proposal-based CNN

Performing non-max suppression

Detecting a person using an anchor box-based algorithm

Image Analysis Applications in Self-Driving Cars

Traffic sign identification

Predicting the angle within which a car needs to be turned

Instance segmentation using the U-net architecture

Semantic segmentation of objects in an image

Image Generation

Introduction

Generating images that can fool a neural network using adversarial attack

DeepDream algorithm to generate images

Neural style transfer between images

Generating images of digits using Generative Adversarial Networks

Generating images using a Deep Convolutional GAN

Face generation using a Deep Convolutional GAN

Face transition from one to another

Performing vector arithmetic on generated images

Encoding Inputs

Introduction

Need for encoding

Encoding an image

Encoding for recommender systems

Text Analysis Using Word Vectors

Introduction

Building a word vector from scratch in Python

Building a word vector using the skip-gram and CBOW models

Performing vector arithmetic using pre-trained word vectors

Creating a document vector

Building word vectors using fastText

Building word vectors using GloVe

Building sentiment classification using word vectors

Building a Recurrent Neural Network

Introduction

Building an RNN from scratch in Python

Implementing RNN for sentiment classification

Building a LSTM Network from scratch in Python

Implementing LSTM for sentiment classification

Implementing stacked LSTM for sentiment classification

Applications of a Many-to-One Architecture RNN

Generating text

Movie recommendations

Topic-modeling, using embeddings

Forecasting the value of a stock's price

Sequence-to-Sequence Learning

Introduction

Returning sequences of outputs from a network

Building a chatbot

Machine translation

Encoder decoder architecture for machine translation

Encoder decoder architecture with attention for machine translation

End-to-End Learning

Introduction

Connectionist temporal classification (CTC)

Handwritten-text recognition

Image caption generation

Generating captions, using beam search

Audio Analysis

Classifying a song by genre

Generating music using deep learning

Transcribing audio into text

Reinforcement Learning

The optimal action to take in a simulated game with a non-negative reward

The optimal action to take in a state in a simulated game

Q-learning to maximize rewards when playing Frozen Lake

Deep Q-learning to balance a cart pole

Deep Q-learning to play Space Invaders game

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Feed-forward propagation from scratch in Python

In order to build a strong foundation of how feed-forward propagation works, we'll go through a toy example of training a neural network where the input to the neural network is (1, 1) and the corresponding output is 0.

Getting ready

The strategy that we'll adopt is as follows: our neural network will have one hidden layer (with neurons) connecting the input layer to the output layer. Note that we have more neurons in the hidden layer than in the input layer, as we want to enable the input layer to be represented in more dimensions:

Calculating the hidden layer unit values

We now assign weights to all of the connections. Note that these weights are selected randomly (based on Gaussian distribution) since it is the first time we're forward-propagating. In this specific case, let's start with initial weights that are between 0 and 1, but note that the final weights after the training process of a neural network don't need to be between a specific set of values:

In the next step, we perform the multiplication of the input with weights to calculate the values of hidden units in the hidden layer.

The hidden layer's unit values are obtained as follows:

The hidden layer's unit values are also shown in the following diagram:

Note that in the preceding output we calculated the hidden values. For simplicity, we excluded the bias terms that need to be added at each unit of a hidden layer.

Now, we will pass the hidden layer values through an activation function so that we attain non-linearity in our output.

If we do not apply the activation function in the hidden layer, the neural network becomes a giant linear connection from input to output.

Applying the activation function

Activation functions are applied at multiple layers of a network. They are used so that we achieve high non-linearity in input, which can be useful in modeling complex relations between the input and output.

The different activation functions are as follows:

For our example, let’s use the sigmoid function for activation. The sigmoid function looks like this, graphically:

By applying sigmoid activation, S(x), to the three hidden=layer sums, we get the following:

final_h₁ = S(1.0) = 0.73

final_h₂ = S(1.3) = 0.78

final_h₃ = S(0.8) = 0.69

Calculating the output layered values

Now that we have calculated the hidden layer values, we will be calculating the output layer value. In the following diagram, we have the hidden layer values connected to the output through the randomly-initialized weight values. Using the hidden layer values and the weight values, we will calculate the output values for the following network:

We perform the sum product of the hidden layer values and weight values to calculate the output value. For simplicity, we excluded the bias terms that need to be added at each unit of the hidden layer:

0.73 * 0.3 + 0.79 * 0.5 + 0.69 * 0.9 = 1.235

The values are shown in the following diagram:

Because we started with a random set of weights, the value of the output neuron is very different from the target, in this case by +1.235 (since the target is 0).

Calculating the loss values

Loss values (alternatively called cost functions) are values that we optimize in a neural network. In order to understand how loss values get calculated, let's look at two scenarios:

Continuous variable prediction
Categorical variable prediction

Calculating loss during continuous variable prediction

Typically, when the variable is a continuous one, the loss value is calculated as the squared error, that is, we try to minimize the mean squared error by varying the weight values associated with the neural network:

In the preceding equation, y(i) is the actual value of output, h(x) is the transformation that we apply on the input (x) to obtain a predicted value of y, and m is the number of rows in the dataset.

Calculating loss during categorical variable prediction

When the variable to predict is a discrete one (that is, there are only a few categories in the variable), we typically use a categorical cross-entropy loss function. When the variable to predict has two distinct values within it, the loss function is binary cross-entropy, and when the variable to predict has multiple distinct values within it, the loss function is a categorical cross-entropy.

Here is binary cross-entropy:

(ylog(p)+(1−y)log(1−p))

Here is categorical cross-entropy:

y is the actual value of output p, is the predicted value of the output and n is the total number of data points. For now, let's assume that the outcome that we are predicting in our toy example is continuous. In that case, the loss function value is the mean squared error, which is calculated as follows:

error = 1.235² = 1.52

In the next step, we will try to minimize the loss function value using back-propagation (which we'll learn about in the next section), where we update the weight values (which were initialized randomly earlier) to minimize the loss (error).

How to do it...

In the previous section, we learned about performing the following steps on top of the input data to come up with error values in forward-propagation (the code file is available as Neural_network_working_details.ipynb in GitHub):

Initialize weights randomly
Calculate the hidden layer unit values by multiplying input values with weights
Perform activation on the hidden layer values
Connect the hidden layer values to the output layer
Calculate the squared error loss

A function to calculate the squared error loss values across all data points is as follows:

import numpy as np
def feed_forward(inputs, outputs, weights):
     pre_hidden = np.dot(inputs,weights[0])+ weights[1]
     hidden = 1/(1+np.exp(-pre_hidden))
     out = np.dot(hidden, weights[2]) + weights[3]
     squared_error = (np.square(pred_out - outputs))
     return squared_error

In the preceding function, we take the input variable values, weights (randomly initialized if this is the first iteration), and the actual output in the provided dataset as the input to the feed-forward function.

We calculate the hidden layer values by performing the matrix multiplication (dot product) of the input and weights. Additionally, we add the bias values in the hidden layer, as follows:

pre_hidden = np.dot(inputs,weights[0])+ weights[1]

The preceding scenario is valid when weights[0] is the weight value and weights[1] is the bias value, where the weight and bias are connecting the input layer to the hidden layer.

Once we calculate the hidden layer values, we perform activation on top of the hidden layer values, as follows:

hidden = 1/(1+np.exp(-pre_hidden))

We now calculate the output at the hidden layer by multiplying the output of the hidden layer with weights that connect the hidden layer to the output, and then adding the bias term at the output, as follows:

pred_out = np.dot(hidden, weights[2]) + weights[3]

Once the output is calculated, we calculate the squared error loss at each row, as follows:

squared_error = (np.square(pred_out - outputs))

In the preceding code, pred_out is the predicted output and outputs is the actual output.

We are then in a position to obtain the loss value as we forward-pass through the network.

While we considered the sigmoid activation on top of the hidden layer values in the preceding code, let's examine other activation functions that are commonly used.

Tanh

The tanh activation of a value (the hidden layer unit value) is calculated as follows:

def tanh(x):
    return (exp(x)-exp(-x))/(exp(x)+exp(-x))

ReLu

The Rectified Linear Unit (ReLU) of a value (the hidden layer unit value) is calculated as follows:

def relu(x):
    return np.where(x>0,x,0)

Linear

The linear activation of a value is the value itself.

Softmax

Typically, softmax is performed on top of a vector of values. This is generally done to determine the probability of an input belonging to one of the n number of the possible output classes in a given scenario. Let's say we are trying to classify an image of a digit into one of the possible 10 classes (numbers from 0 to 9). In this case, there are 10 output values, where each output value should represent the probability of an input image belonging to one of the 10 classes.

The softmax activation is used to provide a probability value for each class in the output and is calculated explained in the following sections:

def softmax(x):
    return np.exp(x)/np.sum(np.exp(x))

Apart from the preceding activation functions, the loss functions that are generally used while building a neural network are as follows.

Mean squared error

The error is the difference between the actual and predicted values of the output. We take a square of the error, as the error can be positive or negative (when the predicted value is greater than the actual value and vice versa). Squaring ensures that positive and negative errors do not offset each other. We calculate the mean squared error so that the error over two different datasets is comparable when the datasets are not the same size.

The mean squared error between predicted values (p) and actual values (y) is calculated as follows:

def mse(p, y):
    return np.mean(np.square(p - y))

The mean squared error is typically used when trying to predict a value that is continuous in nature.

Mean absolute error

The mean absolute error works in a manner that is very similar to the mean squared error. The mean absolute error ensures that positive and negative errors do not offset each other by taking an average of the absolute difference between the actual and predicted values across all data points.

The mean absolute error between the predicted values (p) and actual values (y) is implemented as follows:

def mae(p, y):
    return np.mean(np.abs(p-y))

Similar to the mean squared error, the mean absolute error is generally employed on continuous variables.

Categorical cross-entropy

Cross-entropy is a measure of the difference between two different distributions: actual and predicted. It is applied to categorical output data, unlike the previous two loss functions that we discussed.

Cross-entropy between two distributions is calculated as follows:

y is the actual outcome of the event and p is the predicted outcome of the event.

Categorical cross-entropy between the predicted values (p) and actual values (y) is implemented as follows:

def cat_cross_entropy(p, y):
     return -np.sum((y*np.log2(p)+(1-y)*np.log2(1-p)))

Note that categorical cross-entropy loss has a high value when the predicted value is far away from the actual value and a low value when the values are close.

Neural Networks with Keras Cookbook

By : V Kishore Ayyadevara

Neural Networks with Keras Cookbook

By: V Kishore Ayyadevara

Overview of this book

Related Content you might be interested in

Current Title:

Neural Networks with Keras Cookbook

Modern Computer Vision with PyTorch

Deep Learning with Keras

Python Deep Learning Cookbook