Mastering Computer Vision with TensorFlow 2.x

By : Krishnendu Kar

Mastering Computer Vision with TensorFlow 2.x

By: Krishnendu Kar

Overview of this book

Computer vision allows machines to gain human-level understanding to visualize, process, and analyze images and videos. This book focuses on using TensorFlow to help you learn advanced computer vision tasks such as image acquisition, processing, and analysis. You'll start with the key principles of computer vision and deep learning to build a solid foundation, before covering neural network architectures and understanding how they work rather than using them as a black box. Next, you'll explore architectures such as VGG, ResNet, Inception, R-CNN, SSD, YOLO, and MobileNet. As you advance, you'll learn to use visual search methods using transfer learning. You'll also cover advanced computer vision concepts such as semantic segmentation, image inpainting with GAN's, object tracking, video segmentation, and action recognition. Later, the book focuses on how machine learning and deep learning concepts can be used to perform tasks such as edge detection and face recognition. You'll then discover how to develop powerful neural network models on your PC and on various cloud platforms. Finally, you'll learn to perform model optimization methods to deploy models on edge devices for real-time inference. By the end of this book, you'll have a solid understanding of computer vision and be able to confidently develop models to automate tasks.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Section 1: Introduction to Computer Vision and Neural Networks

Free Chapter

Computer Vision and TensorFlow Fundamentals

Technical requirements

Detecting edges using image hashing and filtering

Extracting features from an image

Object detection using Contours and the HOG detector

An overview of TensorFlow, its ecosystem, and installation

Summary

Content Recognition Using Local Binary Patterns

Processing images using LBP

Applying LBP to texture recognition

Matching face color with foundation color – LBP and its limitations

Matching face color with foundation color – color matching technique

Summary

Facial Detection Using OpenCV and CNN

Applying Viola-Jones AdaBoost learning and the Haar cascade classifier for face recognition

Predicting facial key points using a deep neural network

Predicting facial expressions using a CNN

Overview of 3D face detection

Summary

Deep Learning on Images

Understanding CNNs and their parameters

Optimizing CNN parameters

Visualizing the layers of a neural network

Summary

Section 2: Advanced Concepts of Computer Vision with TensorFlow

Neural Network Architecture and Models

Overview of AlexNet

Overview of VGG16

Overview of Inception

Overview of ResNet

Overview of R-CNN

Overview of Fast R-CNN

Overview of Faster R-CNN

Overview of GANs

Overview of GNNs

Overview of Reinforcement Learning

Overview of Transfer Learning

Summary

Visual Search Using Transfer Learning

Coding deep learning models using TensorFlow

Developing a transfer learning model using TensorFlow

Understanding the architecture and applications of visual search

Working with a visual search input pipeline using tf.data

Summary

Object Detection Using YOLO

An overview of YOLO

An introduction to Darknet for object detection

Real-time prediction using Darknet

YOLO versus YOLO v2 versus YOLO v3

When to train a model?

Training your own image set with YOLO v3 to develop a custom model

An overview of the Feature Pyramid Network and RetinaNet

Summary

Semantic Segmentation and Neural Style Transfer

Overview of TensorFlow DeepLab for semantic segmentation

Artificial image generation using DCGANs

Image inpainting using OpenCV

Understanding neural style transfer

Summary

Section 3: Advanced Implementation of Computer Vision with TensorFlow

Action Recognition Using Multitask Deep Learning

Human pose estimation – OpenPose

Human pose estimation – stacked hourglass model

Human pose estimation – PoseNet

Action recognition using various methods

Summary

Object Detection Using R-CNN, SSD, and R-FCN

An overview of SSD

An overview of R-FCN

An overview of the TensorFlow object detection API

Detecting objects using TensorFlow on Google Cloud

Detecting objects using TensorFlow Hub

Training a custom object detector using TensorFlow and Google Colab

An overview of Mask R-CNN and a Google Colab demonstration

Developing an object tracker model to complement the object detector

Summary

Section 4: TensorFlow Implementation at the Edge and on the Cloud

Deep Learning on Edge Devices with CPU/GPU Optimization

Overview of deep learning on edge devices

Techniques used for GPU/CPU optimization

Overview of MobileNet

Image processing with a Raspberry Pi

Model conversion and inference using OpenVINO

Converting a TensorFlow model developed using the TensorFlow Object Detection API

Application of TensorFlow Lite

Object detection on Android phones using TensorFlow Lite

Object detection on Raspberry Pi using TensorFlow Lite

Object detection on iPhone using TensorFlow Lite and Create ML

A summary of various annotation methods

Summary

Cloud Computing Platform for Computer Vision

Training an object detector in GCP

Training an object detector in the AWS SageMaker cloud platform

Training an object detector in the Microsoft Azure cloud platform

Training at scale and packaging

The general idea behind cloud-based visual search

Analyzing images and search mechanisms in various cloud platforms

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

To get the most out of this book

If you are a beginner in computer vision and TensorFlow and you're trying to master the subject, it is better to go through the book's chapters in sequence rather than jumping around. The book slowly builds on the concepts of computer vision and neural networks and then ends with a code sample. Be sure to get a good grasp of the concepts and architecture presented and then apply the code sample.

We could not upload our image data to GitHub due to size limitations. You can either use images from your own camera or download image datasets from Kaggle:

Food images (for the burger-and-fries sample): Take photos using your cell phone camera.
Kaggle furniture detector: https://www.kaggle.com/akkithetechie/furniture-detector

If you do not understand a concept at first, revisit it and also read any cited papers.

Most of the code is written in Jupyter Notebook environments, so make sure that you have downloaded Anaconda. You also need to download TensorFlow 2.0 – follow the instructions in Chapter 1, Computer Vision and TensorFlow Fundamentals, for that.

Much of the object detection training is done using Google Colab – Chapter 10, Object Detection Using R-CNN, SSD and R-FCN, and Chapter 11, Deep Learning on Edge with CPU/GPU Optimization, provide explanations of how to use Google Colab.

If you want to deploy your computer vision code to edge devices and you're thinking about what to purchase, visit Chapter 11, Deep Learning on Edge Devices with CPU/GPU Optimization, for a detailed analysis of various devices.

The book relies heavily on terminal usage – make sure you have developed a basic understanding of that before reading anything from Chapter 7, Object Detection Using YOLO, onward.

Chapter 12, Cloud Computing Platform for Computer Vision, deals with cloud computing, so you must have an Amazon Web Services, Azure, or Google Cloud Platform account for this. Cloud computing can get expensive if you are not keeping track of your hours. Many providers give you free access to services for some time, but after that, charges can go up if your project is still open, even if you are not training. Remember to shut down your project before you end your account to stop accruing charges. If you have technical questions on cloud computing and are stuck, then you can read the documentation of the relevant cloud computing platform. Also, you can open a technical work ticket for a fee; typically, they are addressed within 1-2 business days.

The best way to get the most out of this book is to read the theory, get an understanding of why a model is developed the way it is, try the sample exercises, and then update the code to suit your needs.

If you have any questions about any section of the book and get stuck, you can always contact me on LinkedIn (https://www.linkedin.com/in/krish-kar-554739b2/ext).

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at www.packt.com.
Select the SUPPORT tab.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Computer-Vision-with-TensorFlow-2.0. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838827069_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Each image that is read is converted to grayscale using the OpenCV BGR2GRAY command."

A block of code is set as follows:

faceresize = cv2.resize(detected_face, (img_size,img_size))
        img_name = "dataset/opencv_frame_{}.jpg".format(img_counter)
        cv2.imwrite(img_name, faceresize)

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "The convolutional neural network (CNN) is the most widely used tool in computer vision for classifying and detecting objects."

Warnings or important notes appear like this.

Tips and tricks appear like this.

Mastering Computer Vision with TensorFlow 2.x

By : Krishnendu Kar

Mastering Computer Vision with TensorFlow 2.x

By: Krishnendu Kar

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Computer Vision with TensorFlow 2.x

Hands-On Computer Vision with TensorFlow 2

Practical Convolutional Neural Networks

Python Image Processing Cookbook