Book Image

Deep Learning with TensorFlow

By : Giancarlo Zaccone, Md. Rezaul Karim, Ahmed Menshawy

Book Image

Deep Learning with TensorFlow

By: Giancarlo Zaccone, Md. Rezaul Karim, Ahmed Menshawy

Overview of this book

Deep learning is the step that comes after machine learning, and has more advanced implementations. Machine learning is not just for academics anymore, but is becoming a mainstream practice through wide adoption, and deep learning has taken the front seat. As a data scientist, if you want to explore data abstraction layers, this book will be your guide. This book shows how this can be exploited in the real world with complex raw data using TensorFlow 1.x. Throughout the book, you’ll learn how to implement deep learning algorithms for machine learning systems and integrate them into your product offerings, including search, image recognition, and language processing. Additionally, you’ll learn how to analyze and improve the performance of deep learning models. This can be done by comparing algorithms against benchmarks, along with machine intelligence, to learn from the information and determine ideal behaviors within a specific context. After finishing the book, you will be familiar with machine learning techniques, in particular the use of TensorFlow for deep learning, and will be ready to apply your knowledge to research or commercial projects.

Preface

What this book covers

What you need for this book

Who this book is for

Reader feedback

Customer support

Free Chapter

Getting Started with Deep Learning

Getting Started with Deep Learning

Introducing machine learning

What is deep learning?

Neural networks

How does an artificial neural network learn?

Neural network architectures

Recurrent Neural Networks

Deep learning framework comparisons

First Look at TensorFlow

First Look at TensorFlow

General overview

Installing TensorFlow on Linux

Requirements for running TensorFlow with GPU from NVIDIA

How to install TensorFlow

Installing TensorFlow on Windows

Computational graphs

Why a computational graph?

The programming model

Implementing a single input neuron

Source code for the single input neuron

Migrating to TensorFlow 1.x

Using TensorFlow on a Feed-Forward Neural Network

Using TensorFlow on a Feed-Forward Neural Network

Introducing feed-forward neural networks

Classification of handwritten digits

Exploring the MNIST dataset

Softmax classifier

How to save and restore a TensorFlow model

Implementing a five-layer neural network

ReLU classifier

Dropout optimization

TensorFlow on a Convolutional Neural Network

TensorFlow on a Convolutional Neural Network

Introducing CNNs

CNN architecture

Building your first CNN

Emotion recognition with CNNs

Optimizing TensorFlow Autoencoders

Optimizing TensorFlow Autoencoders

Introducing autoencoders

Implementing an autoencoder

Improving autoencoder robustness

Building a denoising autoencoder

Convolutional autoencoders

Recurrent Neural Networks

Recurrent Neural Networks

RNNs basic concepts

Unfolding an RNN

The vanishing gradient problem

An image classifier with RNNs

Bidirectional RNNs

Text prediction

GPU Computing

GPGPU computing

The CUDA architecture

GPU programming model

TensorFlow GPU set up

TensorFlow GPU management

GPU memory management

Assigning a single GPU on a multi-GPU system

Using multiple GPUs

Advanced TensorFlow Programming

Advanced TensorFlow Programming

Introducing Keras

Building deep learning models

Sentiment classification of movie reviews

Adding a convolutional layer

Digit classifier

Titanic survival predictor

Advanced Multimedia Programming with TensorFlow

Advanced Multimedia Programming with TensorFlow

Introduction to multimedia analysis

Deep learning for Scalable Object Detection

Accelerated Linear Algebra

TensorFlow and Keras

Deep learning on Android

Reinforcement Learning

Reinforcement Learning

Basic concepts of Reinforcement Learning

Q-learning algorithm

Introducing the OpenAI Gym framework

FrozenLake-v0 implementation problem

Q-learning with TensorFlow

Source code for the Q-learning neural network

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

How does an artificial neural network learn?

The learning process of a neural network is configured as an iterative process of optimization of the weights, and is therefore of the supervised type. The weights are modified based on the network performance on a set of examples belonging to the training set, where the category they belong to is known. The aim is to minimize a loss function, which indicates the degree to which the behavior of the network deviates from the desired one. The performance of the network is then verified on a test set consisting of objects (for example, images in a image classification problem) other than those of the training set.

The backpropagation algorithm

A supervised learning algorithm used is the backpropagation algorithm.

The basic steps of the training procedure are as follows:

Initialize the net with random weights.
For all training cases:
- Forward pass: Calculates the error committed by the net, the difference between the desired output and the actual output.
- Backward pass: For all layers, starting with the output layer, back to the input layer.
Show the network layer output with correct input (error function).
Adapt weights in the current layer to minimize the error function. This is the backpropagation's optimization step. The training process ends when the error on the validation set begins to increase, because this could mark the beginning of a phase of over-fitting of the network, that is, the phase in which the network tends to interpolate the training data at the expense of generalization ability.

Weights optimization

The availability of efficient algorithms to weights optimization, therefore, constitutes an essential tool for the construction of neural networks. The problem can be solved with an iterative numerical technique called gradient descent (GD).

This technique works according to the following algorithm:

Some initial values for the parameters of the model are chosen randomly.
Compute the gradient G of the error function with respect to each parameter of the model.
Change the model's parameters so that they move in the direction of decreasing the error, that is, in the direction of -G.
Repeat steps 2 and 3 until the value of G approaches zero.

In mathematics, the gradient of a scalar field is a real-valued function of several variables, then defined in a region of a space in two, three, or more dimensions. The gradient of a function is defined as the vector that has Cartesian components for the partial derivatives of the function. The gradient represents the direction of maximum increment of a function of n variables: f (x1, x2,...., xn). The gradient is then a vector quantity that indicates a physical quantity as a function of its various different parameters.

The gradient G of the error function E provides the direction in which the error function with the current values has the steeper slope, so to decrease E, we have to make some small steps in the opposite direction, -G (see the following figures).

By repeating this operation several times in an iterative manner, we move in the direction in which the gradient G of the function E is minimal (see the following figure):

Figure 6: Gradient descent procedure

As you can see, we move in the direction in which the gradient G of the function E is minimal.

Stochastic gradient descent

In GD optimization, we compute the cost gradient based on the complete training set; hence, we sometimes also call it batch GD. In the case of very large datasets, using GD can be quite costly, since we are only taking a single step for one pass over the training set. Thus, the larger the training set, the slower our algorithm updates the weights and the longer it may take until it converges to the global cost minimum.

An alternative approach and the fastest of gradient descent, and for this reason, used in DNNs, is the Stochastic Gradient Descent (SGD).

In SGD, we use only one training sample from the training set to do the update for a parameter in a particular iteration. Here, the term stochastic comes from the fact that the gradient based on a single training sample is a stochastic approximation of the true cost gradient.

Due to its stochastic nature, the path toward the global cost minimum is not direct, as in GD, but may zigzag if we are visualizing the cost surface in a 2D space (see the following figure, (b) Stochastic Gradient Descent - SDG).

We can make a comparison between these optimization procedures, showing the next figure, the gradient descent (see the following figure, (a) Gradient Descent - GD) assures that each update in the weights is done in the right direction--the one that minimizes the cost function. With the growth of datasets' size, and more complex computations in each step, SGD came to be preferred in these cases. Here, updates to the weights are done as each sample is processed and, as such, subsequent calculations already use improved weights. Nonetheless, this very reason leads to it incurring some misdirection in minimizing the error function:

Figure 7: GD versus SDG