Book Image

Learning OpenCV 4 Computer Vision with Python 3 - Third Edition

By : Joseph Howse, Joe Minichino
Book Image

Learning OpenCV 4 Computer Vision with Python 3 - Third Edition

By: Joseph Howse, Joe Minichino

Overview of this book

Computer vision is a rapidly evolving science, encompassing diverse applications and techniques. This book will not only help those who are getting started with computer vision but also experts in the domain. You’ll be able to put theory into practice by building apps with OpenCV 4 and Python 3. You’ll start by understanding OpenCV 4 and how to set it up with Python 3 on various platforms. Next, you’ll learn how to perform basic operations such as reading, writing, manipulating, and displaying still images, videos, and camera feeds. From taking you through image processing, video analysis, and depth estimation and segmentation, to helping you gain practice by building a GUI app, this book ensures you’ll have opportunities for hands-on activities. Next, you’ll tackle two popular challenges: face detection and face recognition. You’ll also learn about object classification and machine learning concepts, which will enable you to create and use object detectors and classifiers, and even track objects in movies or video camera feed. Later, you’ll develop your skills in 3D tracking and augmented reality. Finally, you’ll cover ANNs and DNNs, learning how to develop apps for recognizing handwritten digits and classifying a person's gender and age. By the end of this book, you’ll have the skills you need to execute real-world computer vision projects.
Table of Contents (13 chapters)

Cameo – an object-oriented design

Python applications can be written in a purely procedural style. This is often done with small applications, such as our basic I/O scripts, discussed previously. However, from now on, we will often use an object-oriented style because it promotes modularity and extensibility.

From our overview of OpenCV's I/O functionality, we know that all images are similar, regardless of their source or destination. No matter how we obtain a stream of images or where we send it as output, we can apply the same application-specific logic to each frame in this stream. Separation of I/O code and application code becomes especially convenient in an application, such as Cameo, which uses multiple I/O streams.

We will create classes called CaptureManager and WindowManager as high-level interfaces to I/O streams. Our application code may use CaptureManager to read new frames and, optionally, to dispatch each frame to one or more outputs, including a still image file, a video file, and a window (via a WindowManager class). A WindowManager class lets our application code handle a window and events in an object-oriented style.

Both CaptureManager and WindowManager are extensible. We could make implementations that do not rely on OpenCV for I/O.

Abstracting a video stream with managers.CaptureManager

As we have seen, OpenCV can capture, show, and record a stream of images from either a video file or camera, but there are some special considerations in each case. Our CaptureManager class abstracts some of the differences and provides a higher-level interface to dispatch images from the capture stream to one or more outputs—a still image file, video file, or window.

A CaptureManager object is initialized with a VideoCapture object and has enterFrame and exitFrame methods that should typically be called on every iteration of an application's main loop. Between a call to enterFrame and exitFrame, the application may (any number of times) set a channel property and get a frame property. The channel property is initially 0 and only multihead cameras use other values. The frame property is an image corresponding to the current channel's state when enterFrame was called.

A CaptureManager class also has the writeImage, startWritingVideo, and stopWritingVideo methods that may be called at any time. Actual file writing is postponed until exitFrame. Also, during the exitFrame method, frame may be shown in a window, depending on whether the application code provides a WindowManager class either as an argument to the constructor of CaptureManager or by setting the previewWindowManager property.

If the application code manipulates frame, the manipulations are reflected in recorded files and in the window. A CaptureManager class has a constructor argument and property called shouldMirrorPreview, which should be True if we want frame to be mirrored (horizontally flipped) in the window but not in recorded files. Typically, when facing a camera, users prefer a live camera feed to be mirrored.

Recall that a VideoWriter object needs a frame rate, but OpenCV does not provide any reliable way to get an accurate frame rate for a camera. The CaptureManager class works around this limitation by using a frame counter and Python's standard time.time function to estimate the frame rate if necessary. This approach is not foolproof. Depending on frame rate fluctuations and the system-dependent implementation of time.time, the accuracy of the estimate might still be poor in some cases. However, if we deploy to unknown hardware, it is better than just assuming that the user's camera has a particular frame rate.

Let's create a file called, which will contain our implementation of CaptureManager. This implementation turns out to be quite long, so we will look at it in several pieces:

  1. First, let's add imports and a constructor, as follows:
import cv2
import numpy
import time

class CaptureManager(object):

def __init__(self, capture, previewWindowManager = None,
shouldMirrorPreview = False):

self.previewWindowManager = previewWindowManager
self.shouldMirrorPreview = shouldMirrorPreview

self._capture = capture
self._channel = 0
self._enteredFrame = False
self._frame = None
self._imageFilename = None
self._videoFilename = None
self._videoEncoding = None
self._videoWriter = None

self._startTime = None
self._framesElapsed = 0
self._fpsEstimate = None
  1. Next, let's add the following getter and setter methods for the properties of CaptureManager:
def channel(self):
return self._channel

def channel(self, value):
if self._channel != value:
self._channel = value
self._frame = None

def frame(self):
if self._enteredFrame and self._frame is None:
_, self._frame = self._capture.retrieve(
return self._frame

def isWritingImage(self):
return self._imageFilename is not None

def isWritingVideo(self):
return self._videoFilename is not None

Note that most of the member variables are nonpublic, as denoted by the underscore prefix in variable names, such as self._enteredFrame. These nonpublic variables relate to the state of the current frame and any file-writing operations. As discussed previously, the application code only needs to configure a few things, which are implemented as constructor arguments and settable public properties: the camera channel, the window manager, and the option to mirror the camera preview.

This book assumes a certain level of familiarity with Python; however, if you are getting confused by those @ annotations (for example, @property), refer to the Python documentation about decorators, a built-in feature of the language that allows the wrapping of a function by another function, normally used to apply user-defined behavior in several places of an application. Specifically, you can find relevant documentation at

Python does not enforce the concept of nonpublic member variables, but in cases where the developer intends a variable to be treated as nonpublic, you will often see the single-underscore prefix (_) or double-underscore prefix (__). The single-underscore prefix is just a convention, indicating that the variable should be treated as protected (accessed only within the class and its subclasses). The double-underscore prefix actually causes the Python interpreter to rename the variable, such that MyClass.__myVariable becomes MyClass._MyClass__myVariable. This is called name mangling (quite appropriately). By convention, such a variable should be treated as private (accessed only within the class, and not its subclasses). The same prefixes, with the same significance, can be applied to methods as well as variables.
  1. Continuing with our implementation, let's add the enterFrame method to
    def enterFrame(self):
"""Capture the next frame, if any."""

# But first, check that any previous frame was exited.
assert not self._enteredFrame, \
'previous enterFrame() had no matching exitFrame()'

if self._capture is not None:
self._enteredFrame = self._capture.grab()

Note that the implementation of enterFrame only grabs (synchronizes) a frame, whereas actual retrieval from a channel is postponed to a subsequent reading of the frame variable.

  1. Next, let's add the exitFrame method to
    def exitFrame(self):
"""Draw to the window. Write to files. Release the

# Check whether any grabbed frame is retrievable.
# The getter may retrieve and cache the frame.
if self.frame is None:
self._enteredFrame = False

# Update the FPS estimate and related variables.
if self._framesElapsed == 0:
self._startTime = time.time()
timeElapsed = time.time() - self._startTime
self._fpsEstimate = self._framesElapsed / timeElapsed
self._framesElapsed += 1

# Draw to the window, if any.
if self.previewWindowManager is not None:
if self.shouldMirrorPreview:
mirroredFrame = numpy.fliplr(self._frame)

# Write to the image file, if any.
if self.isWritingImage:
cv2.imwrite(self._imageFilename, self._frame)
self._imageFilename = None

# Write to the video file, if any.

# Release the frame.
self._frame = None
self._enteredFrame = False

The implementation of exitFrame takes the image from the current channel, estimates a frame rate, shows the image via the window manager (if any), and fulfills any pending requests to write the image to files.

  1. Several other methods also pertain to file writing. Let's add the following implementations of public methods named writeImage, startWritingVideo, and stopWritingVideo to
    def writeImage(self, filename):
"""Write the next exited frame to an image file."""
self._imageFilename = filename

def startWritingVideo(
self, filename,
encoding = cv2.VideoWriter_fourcc('M','J','P','G')):
"""Start writing exited frames to a video file."""
self._videoFilename = filename
self._videoEncoding = encoding

def stopWritingVideo(self):
"""Stop writing exited frames to a video file."""
self._videoFilename = None
self._videoEncoding = None
self._videoWriter = None

The preceding methods simply update the parameters for file-writing operations, whereas the actual writing operations are postponed to the next call of exitFrame.

  1. Earlier in this section, we saw that exitFrame calls a helper method named _writeVideoFrame. Let's add the following implementation of _writeVideoFrame to
    def _writeVideoFrame(self):

if not self.isWritingVideo:

if self._videoWriter is None:
fps = self._capture.get(cv2.CAP_PROP_FPS)
if fps <= 0.0:
# The capture's FPS is unknown so use an estimate.
if self._framesElapsed < 20:
# Wait until more frames elapse so that the
# estimate is more stable.
fps = self._fpsEstimate
size = (int(self._capture.get(
self._videoWriter = cv2.VideoWriter(
self._videoFilename, self._videoEncoding,
fps, size)


The preceding method creates or appends to a video file in a manner that should be familiar from our earlier scripts (refer to the Reading/writing a video file section, earlier in this chapter). However, in situations where the frame rate is unknown, we skip some frames at the start of the capture session so that we have time to build up an estimate of the frame rate.

This concludes our implementation of CaptureManager. Although it relies on VideoCapture, we could make other implementations that do not use OpenCV for input. For example, we could make a subclass that is instantiated with a socket connection, whose byte stream could be parsed as a stream of images. Also, we could make a subclass that uses a third-party camera library with different hardware support than what OpenCV provides. However, for Cameo, our current implementation is sufficient.

Abstracting a window and keyboard with managers.WindowManager

As we have seen, OpenCV provides functions that cause a window to be created, be destroyed, show an image, and process events. Rather than being methods of a window class, these functions require a window's name to pass as an argument. Since this interface is not object-oriented, it is arguably inconsistent with OpenCV's general style. Also, it is unlikely to be compatible with other window-or event-handling interfaces that we might eventually want to use instead of OpenCV's.

For the sake of object orientation and adaptability, we abstract this functionality into a WindowManager class with the createWindow, destroyWindow, show, and processEvents methods. As a property, WindowManager has a function object called keypressCallback, which (if it is not None) is called from processEvents in response to any keypress. The keypressCallback object must be a function that takes a single argument, specifically an ASCII keycode.

Let's add an implementation of WindowManager to The implementation begins with the following class declaration and __init__ method:

class WindowManager(object):

def __init__(self, windowName, keypressCallback = None):
self.keypressCallback = keypressCallback

self._windowName = windowName
self._isWindowCreated = False

The implementation continues with the following methods to manage the life cycle of the window and its events:

def isWindowCreated(self):
return self._isWindowCreated

def createWindow(self):
self._isWindowCreated = True

def show(self, frame):
cv2.imshow(self._windowName, frame)

def destroyWindow(self):
self._isWindowCreated = False

def processEvents(self):
keycode = cv2.waitKey(1)
if self.keypressCallback is not None and keycode != -1:

Our current implementation only supports keyboard events, which will be sufficient for Cameo. However, we could modify WindowManager to support mouse events, too. For example, the class interface could be expanded to include a mouseCallback property (and optional constructor argument,) but could otherwise remain the same. With an event framework other than OpenCV's, we could support additional event types in the same way by adding callback properties.

Applying everything with cameo.Cameo

Our application is represented by the Cameo class with two methods: run and onKeypress. On initialization, a Cameo object creates a WindowManager object with onKeypress as a callback, as well as a CaptureManager object using a camera (specifically, a cv2.VideoCapture object) and the same WindowManager object. When run is called, the application executes a main loop in which frames and events are processed.

As a result of event processing, onKeypress may be called. The spacebar causes a screenshot to be taken, Tab causes a screencast (a video recording) to start/stop, and Esc causes the application to quit.

In the same directory as, let's create a file called, where we will implement the Cameo class:

  1. The implementation begins with the following import statements and __init__ method:
import cv2
from managers import WindowManager, CaptureManager

class Cameo(object):

def __init__(self):
self._windowManager = WindowManager('Cameo',
self._captureManager = CaptureManager(
cv2.VideoCapture(0), self._windowManager, True)
  1. Next, let's add the following implementation of the run() method:
    def run(self):
"""Run the main loop."""
while self._windowManager.isWindowCreated:
frame = self._captureManager.frame

if frame is not None:
# TODO: Filter the frame (Chapter 3).

  1. To complete the Cameo class implementation, here is the onKeypress() method:
    def onKeypress(self, keycode):
"""Handle a keypress.

space -> Take a screenshot.
tab -> Start/stop recording a screencast.
escape -> Quit.

if keycode == 32: # space
elif keycode == 9: # tab
if not self._captureManager.isWritingVideo:
elif keycode == 27: # escape
  1. Finally, let's add a __main__ block that instantiates and runs Cameo, as follows:
if __name__=="__main__":

When running the application, note that the live camera feed is mirrored, while screenshots and screencasts are not. This is the intended behavior, as we pass True for shouldMirrorPreview when initializing the CaptureManager class.

Here is a screenshot of Cameo, showing a window (with the title Cameo) and the current frame from a camera:

So far, we do not manipulate the frames in any way except to mirror them for preview. We will start to add more interesting effects in Chapter 3, Processing Images with OpenCV.