Building Computer Vision Projects with OpenCV 4 and C++

Building Computer Vision Projects with OpenCV 4 and C++

By : David Millán Escrivá, Prateek Joshi, Vinícius G. Mendonça, Roy Shilkrot

Buy this Book

Building Computer Vision Projects with OpenCV 4 and C++

By: David Millán Escrivá, Prateek Joshi, Vinícius G. Mendonça, Roy Shilkrot

Buy this Book

Overview of this book

OpenCV is one of the best open source libraries available and can help you focus on constructing complete projects on image processing, motion detection, and image segmentation. This Learning Path is your guide to understanding OpenCV concepts and algorithms through real-world examples and activities. Through various projects, you'll also discover how to use complex computer vision and machine learning algorithms and face detection to extract the maximum amount of information from images and videos. In later chapters, you'll learn to enhance your videos and images with optical flow analysis and background subtraction. Sections in the Learning Path will help you get to grips with text segmentation and recognition, in addition to guiding you through the basics of the new and improved deep learning modules. By the end of this Learning Path, you will have mastered commonly used computer vision techniques to build OpenCV projects from scratch. This Learning Path includes content from the following Packt books: •Mastering OpenCV 4 - Third Edition by Roy Shilkrot and David Millán Escrivá •Learn OpenCV 4 By Building Projects - Second Edition by David Millán Escrivá, Vinícius G. Mendonça, and Prateek Joshi

Title Page

About Packt

Contributors

Preface

Free Chapter

Getting Started with OpenCV

Understanding the human visual system

How do humans understand image content?

What can you do with OpenCV?

Installing OpenCV

Summary

An Introduction to the Basics of OpenCV

Technical requirements

Basic CMake configuration file

Creating a library

Managing dependencies

Making the script more complex

Images and matrices

Reading/writing images

Reading videos and cameras

Other basic object types

Basic matrix operations

Basic data persistence and storage

Summary

Learning Graphical User Interfaces

Technical requirements

Introducing the OpenCV user interface

Basic graphical user interface with OpenCV

Graphic user interface with Qt

OpenGL support

Summary

Delving into Histogram and Filters

Technical requirements

Generating a CMake script file

Creating the graphical user interface

Drawing a histogram

Image color equalization

Lomography effect

Cartoonize effect

Summary

Automated Optical Inspection, Object Segmentation, and Detection

Technical requirements

Isolating objects in a scene

Creating an application for AOI

Preprocessing the input image

Segmenting our input image

Summary

Learning Object Classification

Technical requirements

Introducing machine learning concepts

Computer vision and the machine learning workflow

Automatic object inspection classification example

Summary

Detecting Face Parts and Overlaying Masks

Technical requirements

Understanding Haar cascades

What are integral images?

Overlaying a face mask in a live video

Get your sunglasses on

Tracking the nose, mouth, and ears

Summary

Video Surveillance, Background Modeling, and Morphological Operations

Technical requirements

Understanding background subtraction

Naive background subtraction

Frame differencing

The Mixture of Gaussians approach

Morphological image processing

Slimming the shapes

Thickening the shapes

Other morphological operators

Summary

Learning Object Tracking

Technical requirements

Tracking objects of a specific color

Building an interactive object tracker

Detecting points using the Harris corner detector

Good features to track

Feature-based tracking

Summary

Developing Segmentation Algorithms for Text Recognition

Technical requirements

Introducing optical character recognition

Preprocessing stage

Installing Tesseract OCR on your operating system

Using the Tesseract OCR library

Summary

Text Recognition with Tesseract

Technical requirements

How the text API works

Using the text API

Summary

Deep Learning with OpenCV

Technical requirements

Introduction to deep learning

Deep learning in OpenCV

YOLO – real-time object detection

Face detection with SSD

Summary

Cartoonifier and Skin Color Analysis on the RaspberryPi

Accessing the webcam

Main camera processing loop for a desktop app

Implementation of the skin color changer

Porting from desktop to an embedded device

Summary

Explore Structure from Motion with the SfM Module

Technical requirements

Core concepts of SfM

Implementing SfM in OpenCV

Summary

Face Landmark and Pose with the Face Module

Technical requirements

Theory and context

Facial landmark detection in OpenCV

Estimating face direction from landmarks

Summary

Number Plate Recognition with Deep Convolutional Networks

Summary

Face Detection and Recognition with the DNN Module

Introduction to face detection and face recognition

Summary

References

Android Camera Calibration and AR Using the ArUco Module

Technical requirements

Augmented reality and pose estimation

Camera access in Android OS

Camera calibration with ArUco

Augmented reality with jMonkeyEngine

Summary

iOS Panoramas with the Stitching Module

Technical requirements

Panoramic image stitching methods

Project overview

Setting up an iOS OpenCV project with CocoaPods

iOS UI for panorama capture

OpenCV stitching in an Objective-C++ wrapper

Summary

How do humans understand image content?

If you look around, you will see a lot of objects. You encounter many different objects every day, and you recognize them almost instantaneously without any effort. When you see a chair, you don't wait for a few minutes before realizing that it is in fact a chair. You just know that it's a chair right away.

Computers, on the other hand, find it very difficult to do this task. Researchers have been working for many years to find out why computers are not as good as we are at this.

To get an answer to that question, we need to understand how humans do it. The visual data processing happens in the ventral visual stream. This ventral visual stream refers to the pathway in our visual system that is associated with object recognition. It is basically a hierarchy of areas in our brain that helps us recognize objects.

Humans can recognize different objects effortlessly, and can cluster similar objects together. We can do this because we have developed some sort of invariance toward objects of the same class. When we look at an object, our brain extracts the salient points in such a way that factors such as orientation, size, perspective, and illumination don't matter.

A chair that is double the normal size and rotated by 45 degrees is still a chair. We can recognize it easily because of the way we process it. Machines cannot do that so easily. Humans tend to remember an object based on its shape and important features. Regardless of how the object is placed, we can still recognize it.

In our visual system, we build up these hierarchical invariances with respect to position, scale, and viewpoint that help us to be very robust. If you look deeper into our system, you will see that humans have cells in their visual cortex that can respond to shapes such as curves and lines.

As we move further along our ventral stream, we will see more complex cells that are trained to respond to more complex objects such as trees, gates, and so on. The neurons along our ventral stream tend to show an increase in the size of the receptive field. This is coupled with the fact that the complexity of their preferred stimuli increases as well.

Why is it difficult for machines to understand image content?

We now understand how visual data enters the human visual system, and how our system processes it. The issue is that we still don't fully understand how our brain recognizes and organizes this visual data. In machine learning, we just extract some features from images, and ask the computers to learn them using algorithms. We still have these variations, such as shape, size, perspective, angle, illumination, occlusion, and so on.

For example, the same chair looks very different to a machine when you look at it from the profile view. Humans can easily recognize that it's a chair, regardless of how it's presented to us. So, how do we explain this to our machines?

One way to do this would be to store all the different variations of an object, including sizes, angles, perspectives, and so on. But this process is cumbersome and time-consuming. Also, it's actually not possible to gather data that can encompass every single variation. The machines would consume a huge amount of memory and a lot of time to build a model that can recognize these objects.

Even with all this, if an object is partially occluded, computers still won't recognize it. This is because they think this is a new object. So when we build a computer vision library, we need to build the underlying functional blocks that can be combined in many different ways to formulate complex algorithms.

OpenCV provides a lot of these functions, and they are highly optimized. So once we understand what OpenCV is capable of, we can use it effectively to build interesting applications.

Let's go ahead and explore that in the next section.

Building Computer Vision Projects with OpenCV 4 and C++

By : David Millán Escrivá, Prateek Joshi, Vinícius G. Mendonça, Roy Shilkrot

Building Computer Vision Projects with OpenCV 4 and C++

By: David Millán Escrivá, Prateek Joshi, Vinícius G. Mendonça, Roy Shilkrot

Overview of this book

Related Content you might be interested in

Current Title:

Building Computer Vision Projects with OpenCV 4 and C++

How do humans understand image content?

Why is it difficult for machines to understand image content?