Book Image

Practical Convolutional Neural Networks

By : Mohit Sewak, Md. Rezaul Karim, Pradeep Pujari
Book Image

Practical Convolutional Neural Networks

By: Mohit Sewak, Md. Rezaul Karim, Pradeep Pujari

Overview of this book

Convolutional Neural Network (CNN) is revolutionizing several application domains such as visual recognition systems, self-driving cars, medical discoveries, innovative eCommerce and more.You will learn to create innovative solutions around image and video analytics to solve complex machine learning and computer vision related problems and implement real-life CNN models. This book starts with an overview of deep neural networkswith the example of image classification and walks you through building your first CNN for human face detector. We will learn to use concepts like transfer learning with CNN, and Auto-Encoders to build very powerful models, even when not much of supervised training data of labeled images is available. Later we build upon the learning achieved to build advanced vision related algorithms for object detection, instance segmentation, generative adversarial networks, image captioning, attention mechanisms for vision, and recurrent models for vision. By the end of this book, you should be ready to implement advanced, effective and efficient CNN models at your professional project or personal initiatives by working on complex image and video datasets.
Table of Contents (11 chapters)

Introduction to TensorFlow

TensorFlow is based on graph-based computation. Consider the following math expression, for example:

c=(a+b)d = b + 5,

 e = c * d 

In TensorFlow, this is represented as a computational graph, as shown here. This is powerful because computations are done in parallel:

Installing TensorFlow

There are two easy ways to install TensorFlow:

  • Using a virtual environment (recommended and described here)
  • With a Docker image

For macOS X/Linux variants

The following code snippet creates a Python virtual environment and installs TensorFlow in that environment. You should have Anaconda installed before you run this code:

#Creates a virtual environment named "tensorflow_env" assuming that python 3.7 version is already installed.
conda create -n tensorflow_env python=3.7
#Activate points to the environment named "tensorflow" source activate tensorflow_env conda install pandas matplotlib jupyter notebook scipy scikit-learn
#installs latest tensorflow version into environment tensorflow_env pip3 install tensorflow

Please check out the latest updates on the official TensorFlow page,

Try running the following code in your Python console to validate your installation. The console should print Hello World! if TensorFlow is installed and working:

import tensorflow as tf

#Creating TensorFlow object 
hello_constant = tf.constant('Hello World!', name = 'hello_constant')
#Creating a session object for execution of the computational graph
with tf.Session() as sess:
    #Implementing the tf.constant operation in the session
    output =

TensorFlow basics

In TensorFlow, data isn't stored as integers, floats, strings, or other primitives. These values are encapsulated in an object called a tensor. It consists of a set of primitive values shaped into an array of any number of dimensions. The number of dimensions in a tensor is called its rank. In the preceding example, hello_constant is a constant string tensor with rank zero. A few more examples of constant tensors are as follows:

# A is an int32 tensor with rank = 0
A = tf.constant(123)
# B is an int32 tensor with dimension of 1 ( rank = 1 )
B = tf.constant([123,456,789])
# C is an int32 2- dimensional tensor
C = tf.constant([ [123,456,789], [222,333,444] ])

TensorFlow's core program is based on the idea of a computational graph. A computational graph is a directed graph consisting of the following two parts: 

  • Building a computational graph
  • Running a computational graph

A computational graph executes within a session. A TensorFlow session is a runtime environment for the computational graph. It allocates the CPU or GPU and maintains the state of the TensorFlow runtime. The following code creates a session instance named sess using tf.Session. Then the function evaluates the tensor and returns the results stored in the output variable. It finally prints as Hello World!:

with tf.Session() as sess:
# Run the tf.constant operation in the session
output =

Using TensorBoard, we can visualize the graph. To run TensorBoard, use the following command:

tensorboard --logdir=path/to/log-directory

Let's create a piece of simple addition code as follows. Create a constant integer x with value 5, set the value of a new variable y after adding 5 to it, and print it:

constant_x = tf.constant(5, name='constant_x')
variable_y = tf.Variable(x + 5, name='variable_y')
print (variable_y)

The difference is that variable_y isn't given the current value of x + 5 as it should in Python code. Instead, it is an equation; that means, when variable_y is computed, take the value of x at that point in time and add 5 to it. The computation of the value of variable_y is never actually performed in the preceding code. This piece of code actually belongs to the computational graph building section of a typical TensorFlow program. After running this, you'll get something like <tensorflow.python.ops.variables.Variable object at 0x7f074bfd9ef0> and not the actual value of variable_y as 10. To fix this, we have to execute the code section of the computational graph, which looks like this:

#initialize all variables
init = tf.global_variables_initializer()
# All variables are now initialized

with tf.Session() as sess:

Here is the execution of some basic math functions, such as addition, subtraction, multiplication, and division with tensors. For more math functions, please refer to the documentation:

For TensorFlow math functions, go to

Basic math with TensorFlow

The tf.add() function takes two numbers, two tensors, or one of each, and it returns their sum as a tensor:

x = tf.add(1, 2, name=None) # 3

Here's an example with subtraction and multiplication:

x = tf.subtract(1, 2,name=None) # -1
y = tf.multiply(2, 5,name=None) # 10

What if we want to use a non-constant? How to feed an input dataset to TensorFlow? For this, TensorFlow provides an API, tf.placeholder(), and uses feed_dict.

A placeholder is a variable that data is assigned to later in the function. With the help of this, our operations can be created and we can build our computational graph without needing the data. Afterwards, this data is fed into the graph through these placeholders with the help of the feed_dict parameter in to set the placeholder tensor. In the following example, the tensor x is set to the string Hello World before the session runs:

x = tf.placeholder(tf.string)

with tf.Session() as sess:
output =, feed_dict={x: 'Hello World'})

It's also possible to set more than one tensor using feed_dict, as follows:

x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32, None)
z = tf.placeholder(tf.float32, None)

with tf.Session() as sess:
output =, feed_dict={x: 'Welcome to CNN', y: 123, z: 123.45})

Placeholders can also allow storage of arrays with the help of multiple dimensions. Please see the following example:

import tensorflow as tf

x = tf.placeholder("float", [None, 3])
y = x * 2

with tf.Session() as session:
input_data = [[1, 2, 3],
[4, 5, 6],]
result =, feed_dict={x: input_data})
This will throw an error as ValueError: invalid literal for... in cases where the data passed to the feed_dict parameter doesn't match the tensor type and can't be cast into the tensor type.

The tf.truncated_normal() function returns a tensor with random values from a normal distribution. This is mostly used for weight initialization in a network:

n_features = 5
n_labels = 2
weights = tf.truncated_normal((n_features, n_labels))
with tf.Session() as sess:

Softmax in TensorFlow

The softmax function converts its inputs, known as logit or logit scores, to be between 0 and 1, and also normalizes the outputs so that they all sum up to 1. In other words, the softmax function turns your logits into probabilities. Mathematically, the softmax function is defined as follows:

In TensorFlow, the softmax function is implemented. It takes logits and returns softmax activations that have the same type and shape as input logits, as shown in the following image:

The following code is used to implement this:

logit_data = [2.0, 1.0, 0.1]
logits = tf.placeholder(tf.float32)
softmax = tf.nn.softmax(logits)

with tf.Session() as sess:
output =, feed_dict={logits: logit_data})
print( output )

The way we represent labels mathematically is often called one-hot encoding. Each label is represented by a vector that has 1.0 for the correct label and 0.0 for everything else. This works well for most problem cases. However, when the problem has millions of labels, one-hot encoding is not efficient, since most of the vector elements are zeros. We measure the similarity distance between two probability vectors, known as cross-entropy and denoted by D.

Cross-entropy is not symmetric. That means: D(S,L) != D(L,S)

In machine learning, we define what it means for a model to be bad usually by a mathematical function. This function is called loss, cost, or objective function. One very common function used to determine the loss of a model is called the cross-entropy loss. This concept came from information theory (for more on this, please refer to Visual Information Theory at Intuitively, the loss will be high if the model does a poor job of classifying on the training data, and it will be low otherwise, as shown here:

Cross-entropy loss function

In TensorFlow, we can write a cross-entropy function using tf.reduce_sum(); it takes an array of numbers and returns its sum as a tensor (see the following code block):

x = tf.constant([[1,1,1], [1,1,1]])
with tf.Session() as sess:
print([1,2,3]))) #returns 6
print(,0))) #sum along x axis, prints [2,2,2]

But in practice, while computing the softmax function, intermediate terms may be very large due to the exponentials. So, dividing large numbers can be numerically unstable. We should use TensorFlow's provided softmax and cross-entropy loss API. The following code snippet manually calculates cross-entropy loss and also prints the same using the TensorFlow API:

import tensorflow as tf

softmax_data = [0.1,0.5,0.4]
onehot_data = [0.0,1.0,0.0]

softmax = tf.placeholder(tf.float32)
onehot_encoding = tf.placeholder(tf.float32)

cross_entropy = - tf.reduce_sum(tf.multiply(onehot_encoding,tf.log(softmax)))

cross_entropy_loss = tf.nn.softmax_cross_entropy_with_logits(logits=tf.log(softmax), labels=onehot_encoding)

with tf.Session() as session:
print(,feed_dict={softmax:softmax_data, onehot_encoding:onehot_data} ))
print(,feed_dict={softmax:softmax_data, onehot_encoding:onehot_data} ))