Book Image

Mastering Computer Vision with TensorFlow 2.x

By : Krishnendu Kar

Book Image

Mastering Computer Vision with TensorFlow 2.x

By: Krishnendu Kar

Overview of this book

Computer vision allows machines to gain human-level understanding to visualize, process, and analyze images and videos. This book focuses on using TensorFlow to help you learn advanced computer vision tasks such as image acquisition, processing, and analysis. You'll start with the key principles of computer vision and deep learning to build a solid foundation, before covering neural network architectures and understanding how they work rather than using them as a black box. Next, you'll explore architectures such as VGG, ResNet, Inception, R-CNN, SSD, YOLO, and MobileNet. As you advance, you'll learn to use visual search methods using transfer learning. You'll also cover advanced computer vision concepts such as semantic segmentation, image inpainting with GAN's, object tracking, video segmentation, and action recognition. Later, the book focuses on how machine learning and deep learning concepts can be used to perform tasks such as edge detection and face recognition. You'll then discover how to develop powerful neural network models on your PC and on various cloud platforms. Finally, you'll learn to perform model optimization methods to deploy models on edge devices for real-time inference. By the end of this book, you'll have a solid understanding of computer vision and be able to confidently develop models to automate tasks.

Preface

Who this book is for

What this book covers

To get the most out of this book

Section 1: Introduction to Computer Vision and Neural Networks

Section 1: Introduction to Computer Vision and Neural Networks

Free Chapter

Computer Vision and TensorFlow Fundamentals

Computer Vision and TensorFlow Fundamentals

Technical requirements

Detecting edges using image hashing and filtering

Extracting features from an image

Object detection using Contours and the HOG detector

An overview of TensorFlow, its ecosystem, and installation

Content Recognition Using Local Binary Patterns

Content Recognition Using Local Binary Patterns

Processing images using LBP

Applying LBP to texture recognition

Matching face color with foundation color – LBP and its limitations

Matching face color with foundation color – color matching technique

Facial Detection Using OpenCV and CNN

Facial Detection Using OpenCV and CNN

Applying Viola-Jones AdaBoost learning and the Haar cascade classifier for face recognition

Predicting facial key points using a deep neural network

Predicting facial expressions using a CNN

Overview of 3D face detection

Deep Learning on Images

Deep Learning on Images

Understanding CNNs and their parameters

Optimizing CNN parameters

Visualizing the layers of a neural network

Section 2: Advanced Concepts of Computer Vision with TensorFlow

Section 2: Advanced Concepts of Computer Vision with TensorFlow

Neural Network Architecture and Models

Neural Network Architecture and Models

Overview of AlexNet

Overview of VGG16

Overview of Inception

Overview of ResNet

Overview of R-CNN

Overview of Fast R-CNN

Overview of Faster R-CNN

Overview of GANs

Overview of GNNs

Overview of Reinforcement Learning

Overview of Transfer Learning

Visual Search Using Transfer Learning

Visual Search Using Transfer Learning

Coding deep learning models using TensorFlow

Developing a transfer learning model using TensorFlow

Understanding the architecture and applications of visual search

Working with a visual search input pipeline using tf.data

Object Detection Using YOLO

Object Detection Using YOLO

An overview of YOLO

An introduction to Darknet for object detection

Real-time prediction using Darknet

YOLO versus YOLO v2 versus YOLO v3

When to train a model?

Training your own image set with YOLO v3 to develop a custom model

An overview of the Feature Pyramid Network and RetinaNet

Semantic Segmentation and Neural Style Transfer

Semantic Segmentation and Neural Style Transfer

Overview of TensorFlow DeepLab for semantic segmentation

Artificial image generation using DCGANs

Image inpainting using OpenCV

Understanding neural style transfer

Section 3: Advanced Implementation of Computer Vision with TensorFlow

Section 3: Advanced Implementation of Computer Vision with TensorFlow

Action Recognition Using Multitask Deep Learning

Action Recognition Using Multitask Deep Learning

Human pose estimation – OpenPose

Human pose estimation – stacked hourglass model

Human pose estimation – PoseNet

Action recognition using various methods

Object Detection Using R-CNN, SSD, and R-FCN

Object Detection Using R-CNN, SSD, and R-FCN

An overview of SSD

An overview of R-FCN

An overview of the TensorFlow object detection API

Detecting objects using TensorFlow on Google Cloud

Detecting objects using TensorFlow Hub

Training a custom object detector using TensorFlow and Google Colab

An overview of Mask R-CNN and a Google Colab demonstration

Developing an object tracker model to complement the object detector

Section 4: TensorFlow Implementation at the Edge and on the Cloud

Section 4: TensorFlow Implementation at the Edge and on the Cloud

Deep Learning on Edge Devices with CPU/GPU Optimization

Deep Learning on Edge Devices with CPU/GPU Optimization

Overview of deep learning on edge devices

Techniques used for GPU/CPU optimization

Overview of MobileNet

Image processing with a Raspberry Pi

Model conversion and inference using OpenVINO

Converting a TensorFlow model developed using the TensorFlow Object Detection API

Application of TensorFlow Lite

Object detection on Android phones using TensorFlow Lite

Object detection on Raspberry Pi using TensorFlow Lite

Object detection on iPhone using TensorFlow Lite and Create ML

A summary of various annotation methods

Cloud Computing Platform for Computer Vision

Cloud Computing Platform for Computer Vision

Training an object detector in GCP

Training an object detector in the AWS SageMaker cloud platform

Training an object detector in the Microsoft Azure cloud platform

Training at scale and packaging

The general idea behind cloud-based visual search

Analyzing images and search mechanisms in various cloud platforms

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Human pose estimation – stacked hourglass model

The stacked hourglass model was developed in 2016 by Alejandro Newell, Kaiyu Yang, and Jia Deng in their paper titled Stacked Hourglass Networks for Human Pose Estimation. The details of the model can be found at https://arxiv.org/abs/1603.06937.

The architecture of the model is illustrated in the following diagram:

The key features of this model are as follows:

Bottom-up and top-down processing of the feature is repeated across all scales by stacking multiple hourglasses together. This method results in being able to verify the initial estimates and features across the whole image.
The network uses multiple convolutions and a max pooling layer, which results in a low final resolution, before upsampling to bring the resolution back up.
At each max pooling step, additional convolutional layers are added parallel to the main...