OpenCV By Example

OpenCV By Example

By : Prateek Joshi, David Millán Escrivá, Vinícius G. Mendonça

Buy this Book

OpenCV By Example

By: Prateek Joshi, David Millán Escrivá, Vinícius G. Mendonça

Buy this Book

Overview of this book

Open CV is a cross-platform, free-for-use library that is primarily used for real-time Computer Vision and image processing. It is considered to be one of the best open source libraries that helps developers focus on constructing complete projects on image processing, motion detection, and image segmentation. Whether you are completely new to the concept of Computer Vision or have a basic understanding of it, this book will be your guide to understanding the basic OpenCV concepts and algorithms through amazing real-world examples and projects. Starting from the installation of OpenCV on your system and understanding the basics of image processing, we swiftly move on to creating optical flow video analysis or text recognition in complex scenes, and will take you through the commonly used Computer Vision techniques to build your own Open CV projects from scratch. By the end of this book, you will be familiar with the basics of Open CV such as matrix operations, filters, and histograms, as well as more advanced concepts such as segmentation, machine learning, complex video analysis, and text recognition.

OpenCV By Example

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Getting Started with OpenCV

Understanding the human visual system

How do humans understand image content?

What can you do with OpenCV?

Installing OpenCV

Summary

An Introduction to the Basics of OpenCV

Basic CMake configuration files

Creating a library

Managing dependencies

Making the script more complex

Images and matrices

Reading/writing images

Reading videos and cameras

Other basic object types

Basic matrix operations

Basic data persistence and storage

Summary

Learning the Graphical User Interface and Basic Filtering

Introducing the OpenCV user interface

A basic graphical user interface with OpenCV

The graphical user interface with QT

Adding slider and mouse events to our interfaces

Adding buttons to a user interface

OpenGL support

Summary

Delving into Histograms and Filters

Generating a CMake script file

Creating the Graphical User Interface

Drawing a histogram

Image color equalization

Lomography effect

The cartoonize effect

Summary

Automated Optical Inspection, Object Segmentation, and Detection

Isolating objects in a scene

Creating an application for AOI

Preprocessing the input image

Segmenting our input image

Summary

Learning Object Classification

Introducing machine learning concepts

Computer Vision and the machine learning workflow

Automatic object inspection classification example

Feature extraction

Summary

Detecting Face Parts and Overlaying Masks

Understanding Haar cascades

What are integral images?

Overlaying a facemask in a live video

Get your sunglasses on

Tracking your nose, mouth, and ears

Summary

Video Surveillance, Background Modeling, and Morphological Operations

Understanding background subtraction

Naive background subtraction

Frame differencing

The Mixture of Gaussians approach

Morphological image processing

Slimming the shapes

Thickening the shapes

Other morphological operators

Summary

Learning Object Tracking

Tracking objects of a specific color

Building an interactive object tracker

Detecting points using the Harris corner detector

Shi-Tomasi Corner Detector

Feature-based tracking

Summary

Developing Segmentation Algorithms for Text Recognition

Introducing optical character recognition

The preprocessing step

Installing Tesseract OCR on your operating system

Using Tesseract OCR library

Summary

Text Recognition with Tesseract

How the text API works

Using the text API

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

How do humans understand image content?

If you look around, you will see a lot of objects. You may encounter many different objects every day, and you recognize them almost instantaneously without any effort. When you see a chair, you don't wait for a few minutes before realizing that it is, in fact, a chair. You just know that it's a chair right away! Now, on the other hand, computers find it very difficult to do this task. Researchers have been working for many years to find out why computers are not as good as we are at this.

To get an answer to this question, we need to understand how humans do it. The visual data processing happens in the ventral visual stream. This ventral visual stream refers to the pathway in our visual system that is associated with object recognition. It is basically a hierarchy of areas in our brain that helps us recognize objects. Humans can recognize different objects effortlessly, and we can cluster similar objects together. We can do this because we have developed some sort of invariance toward objects of the same class. When we look at an object, our brain extracts the salient points in such a way that factors such as orientation, size, perspective, and illumination don't matter.

A chair that is double the normal size and rotated by 45 degrees is still a chair. We can easily recognize it because of the way we process it. Machines cannot do this so easily. Humans tend to remember an object based on its shape and important features. Regardless of how the object is placed, we can still recognize it. In our visual system, we build these hierarchical invariances with respect to position, scale, and viewpoint that help us to be very robust.

If you look deeper in our system, you will see that humans have cells in their visual cortex that can respond to shapes, such as curves and lines. As we move further along our ventral stream, we will see more complex cells that are trained to respond to more complex objects, such as trees, gates, and so on. The neurons along our ventral stream tend to show an increase in the size of the receptive field. This is coupled with the fact that the complexity of their preferred stimuli increases as well.

Why is it difficult for machines to understand image content?

We now understand how visual data enters the human visual system and how our system processes it. The issue is that we still don't completely understand how our brain recognizes and organizes this visual data. We just extract some features from images and ask the computer to learn from them using machine learning algorithms. We still have those variations such as shape, size, perspective, angle, illumination, occlusion, and so on. For example, the same chair looks very different to a machine when you look at it from the side view. Humans can easily recognize that it's a chair regardless of how it's presented to us. So, how do we explain this to our machines?

One way to do this would be to store all the different variations of an object, including sizes, angles, perspectives, and so on. But this process is cumbersome and time-consuming! Also, it's actually not possible to gather data that can encompass every single variation. The machines will consume a huge amount of memory and a lot of time to build a model that can recognize these objects. Even with all this, if an object is partially occluded, computers still won't be able to recognize it. This is because they think that this is a new object. So, when we build a Computer Vision library, we need to build the underlying functional blocks that can be combined in many different ways to formulate complex algorithms. OpenCV provides a lot of these functions and they are highly optimized. So, once we understand what OpenCV provides out of the box, we can use it effectively to build interesting applications. Let's go ahead and explore this in the next section.

OpenCV By Example

By : Prateek Joshi, David Millán Escrivá, Vinícius G. Mendonça

OpenCV By Example

By: Prateek Joshi, David Millán Escrivá, Vinícius G. Mendonça

Overview of this book

Related Content you might be interested in

Current Title:

OpenCV By Example

How do humans understand image content?

Why is it difficult for machines to understand image content?