Modern Computer Vision with PyTorch

By : V Kishore Ayyadevara, Yeshwanth Reddy

Modern Computer Vision with PyTorch

By: V Kishore Ayyadevara, Yeshwanth Reddy

Overview of this book

Deep learning is the driving force behind many recent advances in various computer vision (CV) applications. This book takes a hands-on approach to help you to solve over 50 CV problems using PyTorch1.x on real-world datasets. You’ll start by building a neural network (NN) from scratch using NumPy and PyTorch and discover best practices for tweaking its hyperparameters. You’ll then perform image classification using convolutional neural networks and transfer learning and understand how they work. As you progress, you’ll implement multiple use cases of 2D and 3D multi-object detection, segmentation, human-pose-estimation by learning about the R-CNN family, SSD, YOLO, U-Net architectures, and the Detectron2 platform. The book will also guide you in performing facial expression swapping, generating new faces, and manipulating facial expressions as you explore autoencoders and modern generative adversarial networks. You’ll learn how to combine CV with NLP techniques, such as LSTM and transformer, and RL techniques, such as Deep Q-learning, to implement OCR, image captioning, object detection, and a self-driving car agent. Finally, you'll move your NN model to production on the AWS Cloud. By the end of this book, you’ll be able to leverage modern NN architectures to solve over 50 real-world CV problems confidently.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Section 1 - Fundamentals of Deep Learning for Computer Vision

Free Chapter

Artificial Neural Network Fundamentals

Comparing AI and traditional machine learning

Learning about the artificial neural network building blocks

Implementing feedforward propagation

Implementing backpropagation

Putting feedforward propagation and backpropagation together

Understanding the impact of the learning rate

Summarizing the training process of a neural network

Summary

Questions

PyTorch Fundamentals

Installing PyTorch

PyTorch tensors

Building a neural network using PyTorch

Using a sequential method to build a neural network

Saving and loading a PyTorch model

Summary

Questions

Building a Deep Neural Network with PyTorch

Representing an image

Why leverage neural networks for image analysis?

Preparing our data for image classification

Training a neural network

Scaling a dataset to improve model accuracy

Understanding the impact of varying the batch size

Understanding the impact of varying the loss optimizer

Understanding the impact of varying the learning rate

Understanding the impact of learning rate annealing

Building a deeper neural network

Understanding the impact of batch normalization

The concept of overfitting

Summary

Questions

Section 2 - Object Classification and Detection

Introducing Convolutional Neural Networks

The problem with traditional deep neural networks

Building blocks of a CNN

Implementing a CNN

Classifying images using deep CNNs

Implementing data augmentation

Visualizing the outcome of feature learning

Building a CNN for classifying real-world images

Summary

Questions

Transfer Learning for Image Classification

Introducing transfer learning

Understanding VGG16 architecture

Understanding ResNet architecture

Implementing facial key point detection

Multi-task learning – Implementing age estimation and gender classification

Introducing the torch_snippets library

Summary

Questions

Practical Aspects of Image Classification

Generating CAMs

Understanding the impact of data augmentation and batch normalization

Practical aspects to take care of during model implementation

Summary

Questions

Basics of Object Detection

Introducing object detection

Creating a bounding box ground truth for training

Understanding region proposals

Understanding IoU

Non-max suppression

Mean average precision

Training R-CNN-based custom object detectors

Training Fast R-CNN-based custom object detectors

Summary

Questions

Advanced Object Detection

Components of modern object detection algorithms

Training Faster R-CNN on a custom dataset

Working details of YOLO

Training YOLO on a custom dataset

Working details of SSD

Training SSD on a custom dataset

Summary

Test your understanding

Image Segmentation

Exploring the U-Net architecture

Implementing semantic segmentation using U-Net

Exploring the Mask R-CNN architecture

Implementing instance segmentation using Mask R-CNN

Summary

Questions

Applications of Object Detection and Segmentation

Multi-object instance segmentation

Human pose detection

Crowd counting

Image colorization

3D object detection with point clouds

Summary

Section 3 - Image Manipulation

Autoencoders and Image Manipulation

Understanding autoencoders

Understanding convolutional autoencoders

Understanding variational autoencoders

Performing an adversarial attack on images

Performing neural style transfer

Generating deep fakes

Summary

Questions

Image Generation Using GANs

Introducing GANs

Using GANs to generate handwritten digits

Using DCGANs to generate face images

Implementing conditional GANs

Summary

Questions

Advanced GANs to Manipulate Images

Leveraging the Pix2Pix GAN

Leveraging CycleGAN

Leveraging StyleGAN on custom images

Super-resolution GAN

Summary

Questions

Section 4 - Combining Computer Vision with Other Techniques

Training with Minimal Data Points

Implementing zero-shot learning

Implementing few-shot learning

Summary

Questions

Combining Computer Vision and NLP Techniques

Introducing RNNs

Introducing LSTM architecture

Implementing image captioning

Transcribing handwritten images

Object detection using DETR

Summary

Questions

Combining Computer Vision and Reinforcement Learning

Learning the basics of reinforcement learning

Implementing Q-learning

Implementing deep Q-learning

Implementing deep Q-learning with the fixed targets model

Implementing an agent to perform autonomous driving

Summary

Questions

Moving a Model to Production

Understanding the basics of an API

Creating an API and making predictions on a local server

Moving the API to the cloud

Summary

Using OpenCV Utilities for Image Analysis

Drawing bounding boxes around words in an image

Detecting lanes in an image of a road

Detecting objects based on color

Building a panoramic view of images

Detecting the number plate of a car

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Appendix

Chapter 1 - Artificial Neural Network Fundamentals

Chapter 2 - PyTorch Fundamentals

Chapter 3 - Building a Deep Neural Network with PyTorch

Chapter 4 - Introducing Convolutional Neural Networks

Chapter 5 - Transfer Learning for Image Classification

Chapter 6 - Practical Aspects of Image Classification

Chapter 7 - Basics of Object Detection

Chapter 8 - Advanced Object Detection

Chapter 9 - Image Segmentation

Chapter 11 - Autoencoders and Image Manipulation

Chapter 12 - Image Generation Using GANs

Chapter 13 - Advanced GANs to Manipulate Images

Chapter 14 - Training with Minimal Data Points

Chapter 15 - Combining Computer Vision and NLP Techniques

Chapter 16 - Combining Computer Vision and Reinforcement Learning

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Putting feedforward propagation and backpropagation together

In this section, we will build a simple neural network with a hidden layer that connects the input to the output on the same toy dataset that we worked on in the Feedforward propagation in code section and also leverage the update_weights function that we defined in the previous section to perform backpropagation to obtain the optimal weight and bias values.

We define the model as follows:

The input is connected to a hidden layer that has three units/ nodes.
The hidden layer is connected to the output, which has one unit in the output layer.

The following code is available as Back_propagation.ipynb in the Chapter01 folder of this book's GitHub repository - https://tinyurl.com/mcvp-packt

We will create the network as follows:

Import the relevant packages and define the dataset:

from copy import deepcopy
import numpy as np 
x = np.array([[1,1]])
y = np.array([[0]])

Initialize the weight and bias values randomly.

The hidden layer has three units in it and each input node is connected to each of the hidden layer units. Hence, there are a total of six weight values and three bias values – one bias and two weights (two weights coming from two input nodes) corresponding to each of the hidden units. Additionally, the final layer has one unit that is connected to the three units of the hidden layer. Hence, a total of three weights and one bias dictate the value of the output layer. The randomly initialized weights are as follows:

W = [
    np.array([[-0.0053, 0.3793], 
              [-0.5820, -0.5204],
              [-0.2723, 0.1896]], dtype=np.float32).T, 
    np.array([-0.0140, 0.5607, -0.0628], dtype=np.float32), 
    np.array([[ 0.1528,-0.1745,-0.1135]],dtype=np.float32).T, 
    np.array([-0.5516], dtype=np.float32)
]

In the preceding code, the first array of parameters correspond to the 2 x 3 matrix of weights that connect the input layer to the hidden layer. The second array of parameters represent the bias values associated with each node of the hidden layer. The third array of parameters correspond to the 3 x 1 matrix of weights joining the hidden layer to the output layer, and the final array of parameters represents the bias associated with the output layer.

Run the neural network through 100 epochs of feedforward propagation and backpropagation – the functions of which were already learned and defined as feed_forward and update_weights functions in the previous sections.

Define the feed_forward function:

def feed_forward(inputs, outputs, weights): 
    pre_hidden = np.dot(inputs,weights[0])+ weights[1]
    hidden = 1/(1+np.exp(-pre_hidden))
    pred_out = np.dot(hidden, weights[2]) + weights[3]
    mean_squared_error = np.mean(np.square(pred_out \
                                           - outputs)) 
    return mean_squared_error

Define the update_weights function:

def update_weights(inputs, outputs, weights, lr):
    original_weights = deepcopy(weights)
    temp_weights = deepcopy(weights)
    updated_weights = deepcopy(weights) 
    original_loss = feed_forward(inputs, outputs, \
                                 original_weights)
    for i, layer in enumerate(original_weights):
        for index, weight in np.ndenumerate(layer):
            temp_weights = deepcopy(weights)
            temp_weights[i][index] += 0.0001
            _loss_plus = feed_forward(inputs, outputs, \
                                      temp_weights)
            grad = (_loss_plus - original_loss)/(0.0001)
            updated_weights[i][index] -= grad*lr
    return updated_weights, original_loss

Update weights over 100 epochs and fetch the loss value and the updated weight values:

losses = []
for epoch in range(100):
    W, loss = update_weights(x,y,W,0.01)
    losses.append(loss)

Plot the loss values:

import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(losses)
plt.title('Loss over increasing number of epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss value')

The preceding code generates the following plot:

As you can see, the loss started at around 0.33 and steadily dropped to around 0.0001. This is an indication that weights are adjusted according to the input-output data and when an input is given, we can expect it to predict the output that we have been comparing it against in the loss function. The output weights are as follows:

[array([[ 0.01424004, -0.5907864 , -0.27549535],
        [ 0.39883757, -0.52918637, 0.18640439]], dtype=float32),
 array([ 0.00554004, 0.5519136 , -0.06599568], dtype=float32),
 array([[ 0.3475135 ],
        [-0.05529078],
        [ 0.03760847]], dtype=float32),
 array([-0.22443289], dtype=float32)]

The PyTorch version of the same code with the same weights is demonstrated in the GitHub notebook (Auto_gradient_of_tensors.ipynb). Revisit this section after understanding the core PyTorch concepts in the next chapter. Verify for yourself that the input and output are indeed the same whether the network is written in NumPy or PyTorch. Building a network from scratch using NumPy arrays, while sub-optimal, is done in this chapter to help you have a solid foundation of the working details of neural networks.

Once we have the updated weights, make the predictions for the input by passing the input through the network and calculate the output value:

pre_hidden = np.dot(x,W[0]) + W[1]
hidden = 1/(1+np.exp(-pre_hidden))
pred_out = np.dot(hidden, W[2]) + W[3]
# -0.017

The output of the preceding code is the value of -0.017, which is a value that is very close to the expected output of 0. As we train for more epochs, the pred_out value gets even closer to 0.

So far, we have learned about feedforward propagation and backpropagation. The key piece in the update_weights function that we defined here is the learning rate – which we will learn about in the next section.

Modern Computer Vision with PyTorch

By : V Kishore Ayyadevara, Yeshwanth Reddy

Modern Computer Vision with PyTorch

By: V Kishore Ayyadevara, Yeshwanth Reddy

Overview of this book

Related Content you might be interested in

Current Title:

Modern Computer Vision with PyTorch

Neural Networks with Keras Cookbook

PyTorch Artificial Intelligence Fundamentals

PyTorch Computer Vision Cookbook

Putting feedforward propagation and backpropagation together