Book Image

Mastering OpenCV 4 - Third Edition

By : Roy Shilkrot, David Millán Escrivá
Book Image

Mastering OpenCV 4 - Third Edition

By: Roy Shilkrot, David Millán Escrivá

Overview of this book

Mastering OpenCV, now in its third edition, targets computer vision engineers taking their first steps toward mastering OpenCV. Keeping the mathematical formulations to a solid but bare minimum, the book delivers complete projects from ideation to running code, targeting current hot topics in computer vision such as face recognition, landmark detection and pose estimation, and number recognition with deep convolutional networks. You’ll learn from experienced OpenCV experts how to implement computer vision products and projects both in academia and industry in a comfortable package. You’ll get acquainted with API functionality and gain insights into design choices in a complete computer vision project. You’ll also go beyond the basics of computer vision to implement solutions for complex image processing projects. By the end of the book, you will have created various working prototypes with the help of projects in the book and be well versed with the new features of OpenCV4.
Table of Contents (12 chapters)

Main camera processing loop for a desktop app

If you want to display a GUI window on the screen using OpenCV, you call the cv::namedWindow() function and then the cv::imshow() function for each image, but you must also call cv::waitKey() once per frame, otherwise your windows will not update at all! Calling cv::waitKey(0) waits forever until the user hits a key in the window, but a positive number such as waitKey(20) or higher will wait for at least that many milliseconds.

Put this main loop in the main.cpp file, as the basis of your real-time camera app:

while (true) { 
// Grab the next camera frame.
cv::Mat cameraFrame;
camera >> cameraFrame;
if (cameraFrame.empty()) {
std::cerr<<"ERROR: Couldn't grab a camera frame."<<
// Create a blank output image, that we will draw onto.
cv::Mat displayedFrame(cameraFrame.size(), cv::CV_8UC3);

// Run the cartoonifier filter on the camera frame.
cartoonifyImage(cameraFrame, displayedFrame);

// Display the processed image onto the screen.
imshow("Cartoonifier", displayedFrame);

// IMPORTANT: Wait for atleast 20 milliseconds,
// so that the image can be displayed on the screen!
// Also checks if a key was pressed in the GUI window.
// Note that it should be a "char" to support Linux.
auto keypress = cv::waitKey(20); // Needed to see anything!
if (keypress == 27) { // Escape Key
// Quit the program!
}//end while

Generating a black and white sketch

To obtain a sketch (black and white drawing) of the camera frame, we will use an edge detection filter, whereas to obtain a color painting, we will use an edge preserving filter (bilateral filter) to further smooth the flat regions while keeping edges intact. By overlaying the sketch drawing on top of the color painting, we obtain a cartoon effect, as shown earlier in the screenshot of the final app.

There are many different edge detection filters, such as Sobel, Scharr, and Laplacian filters, or a Canny edge detector. We will use a Laplacian edge filter since it produces edges that look most similar to hand sketches compared to Sobel or Scharr, and is quite consistent compared to a Canny edge detector, which produces very clean line drawings but is affected more by random noise in the camera frames, and therefore the line drawings would often change drastically between frames.

Nevertheless, we still need to reduce the noise in the image before we use a Laplacian edge filter. We will use a median filter because it is good at removing noise while keeping edges sharp, but is not as slow as a bilateral filter. Since Laplacian filters use grayscale images, we must convert from OpenCV's default BGR format to grayscale. In your empty cartoon.cpp file, put this code at the top so you can access OpenCV and STD C++ templates without typing cv:: and std:: everywhere:

// Include OpenCV's C++ Interface 
#include <opencv2/opencv.hpp>

using namespace cv;
using namespace std;

Put this and all remaining code in a cartoonifyImage() function in your cartoon.cpp file:

Mat gray; 
cvtColor(srcColor, gray, CV_BGR2GRAY);
medianBlur(gray, gray, MEDIAN_BLUR_FILTER_SIZE);
Mat edges;
Laplacian(gray, edges, CV_8U, LAPLACIAN_FILTER_SIZE);

The Laplacian filter produces edges with varying brightness, so to make the edges look more like a sketch, we apply a binary threshold to make the edges either white or black:

Mat mask; 
const int EDGES_THRESHOLD = 80;
threshold(edges, mask, EDGES_THRESHOLD, 255, THRESH_BINARY_INV);

In the following diagram, you see the original image (to the left) and the generated edge mask (to the right), which looks similar to a sketch drawing. After we generate a color painting (explained later), we also put this edge mask on top to have black line drawings:

Generating a color painting and a cartoon

A strong bilateral filter smooths flat regions while keeping edges sharp, and therefore is great as an automatic cartoonifier or painting filter, except that it is extremely slow (that is, measured in seconds or even minutes, rather than milliseconds!). Therefore, we will use some tricks to obtain a nice cartoonifier, while still running at an acceptable speed. The most important trick we can use is that we can perform bilateral filtering at a lower resolution and it will still have a similar effect as a full resolution, but run much faster. Let's reduce the total number of pixels by four (for example, half width and half height):

Size size = srcColor.size(); 
Size smallSize;
smallSize.width = size.width/2;
smallSize.height = size.height/2;
Mat smallImg = Mat(smallSize, CV_8UC3);
resize(srcColor, smallImg, smallSize, 0,0, INTER_LINEAR);

Rather than applying a large bilateral filter, we will apply many small bilateral filters, to produce a strong cartoon effect in less time. We will truncate the filter (refer to the following diagram) so that instead of performing a whole filter (for example, a filter size of 21 x 21, when the bell curve is 21 pixels wide), it just uses the minimum filter size needed for a convincing result (for example, with a filter size of just 9 x 9 even if the bell curve is 21 pixels wide). This truncated filter will apply the major part of the filter (gray area) without wasting time on the minor part of the filter (white area under the curve), so it will run several times faster:

Therefore, we have four parameters that control the bilateral filter: color strength, positional strength, size, and repetition count. We need a temp Mat since the bilateralFilter() function can't overwrite its input (referred to as in-place processing), but we can apply one filter storing a temp Mat and another filter storing back the input:

Mat tmp = Mat(smallSize, CV_8UC3); 
auto repetitions = 7; // Repetitions for strong cartoon effect.
for (auto i=0; i<repetitions; i++) {
auto ksize = 9; // Filter size. Has large effect on speed.
double sigmaColor = 9; // Filter color strength.
double sigmaSpace = 7; // Spatial strength. Affects speed.
bilateralFilter(smallImg, tmp, ksize, sigmaColor, sigmaSpace);
bilateralFilter(tmp, smallImg, ksize, sigmaColor, sigmaSpace);


Remember that this was applied to the shrunken image, so we need to expand the image back to the original size. Then, we can overlay the edge mask that we found earlier. To overlay the edge mask sketch onto the bilateral filter painting (left-hand side of the following image), we can start with a black background and copy the painting pixels that aren't edges in the sketch mask:

Mat bigImg; 
resize(smallImg, bigImg, size, 0,0, INTER_LINEAR);
bigImg.copyTo(dst, mask);

The result is a cartoon version of the original photo, as shown on the right-hand side of the following image, where the sketch mask is overlaid on the painting:

Generating an evil mode using edge filters

Cartoons and comics always have both good and bad characters. With the right combination of edge filters, a scary image can be generated from the most innocent looking people! The trick is to use a small edge filter that will find many edges all over the image, then merge the edges using a small median filter.

We will perform this on a grayscale image with some noise reduction, so the preceding code for converting the original image to grayscale and applying a 7 x 7 median filter should still be used (the first image in the following diagram shows the output of the grayscale median blur). Instead of following it with a Laplacian filter and Binary threshold, we can get a scarier look if we apply a 3 x 3 Scharr gradient filter along x and y (second image in the diagram), then a binary threshold with a very low cutoff (third image in the diagram), and a 3 x 3 median blur, producing the final evil mask (fourth image in the diagram):

Mat gray;
cvtColor(srcColor, gray, CV_BGR2GRAY);
medianBlur(gray, gray, MEDIAN_BLUR_FILTER_SIZE);
Mat edges, edges2;
Scharr(srcGray, edges, CV_8U, 1, 0);
Scharr(srcGray, edges2, CV_8U, 1, 0, -1);
edges += edges2;
// Combine the x & y edges together.
const int EVIL_EDGE_THRESHOLD = 12
threshold(edges, mask, EVIL_EDGE_THRESHOLD, 255,
medianBlur(mask, mask, 3)

The following diagram shows the evil effect applied in the fourth image:

Now that we have an evil mask, we can overlay this mask onto the cartoonified painting image as we did with the regular sketch edge mask. The final result is shown on the right-hand side of the following diagram:

Generating an alien mode using skin detection

Now that we have a sketch mode, a cartoon mode (painting + sketch mask), and an evil mode (painting + evil mask), for fun, let's try something more complex: an alien mode, by detecting the skin regions of the face and then changing the skin color to green.

Skin detection algorithm

There are many different techniques used for detecting skin regions, from simple color thresholds using RGB (short for Red-Green-Blue) or HSV (short for Hue-Saturation-Brightness) values, or color histogram calculation and re-projection, to complex machine learning algorithms of mixture models that need camera calibration in the CIELab color space, offline training with many sample faces, and so on. But even the complex methods don't necessarily work robustly across various camera and lighting conditions and skin types. Since we want our skin detection to run on an embedded device, without any calibration or training, and we are just using skin detection for a fun image filter; it is sufficient for us to use a simple skin detection method. However, the color responses from the tiny camera sensor in the Raspberry Pi Camera Module tend to vary significantly, and we want to support skin detection for people of any skin color but without any calibration, so we need something more robust than simple color thresholds.

For example, a simple HSV skin detector can treat any pixel as skin if its hue color is fairly red, saturation is fairly high but not extremely high, and its brightness is not too dark or extremely bright. But cameras in mobile phones or Raspberry Pi Camera Modules often have bad white balancing; therefore, a person's skin might look slightly blue instead of red, for instance, and this would be a major problem for simple HSV thresholding.

A more robust solution is to perform face detection with a Haar or LBP cascade classifier (shown in Chapter 5, Face Detection and Recognition with the DNN Module), then look at the range of colors for the pixels in the middle of the detected face, since you know that those pixels should be skin pixels of the actual person. You could then scan the whole image or nearby region for pixels of a similar color as the center of the face. This has the advantage that it is very likely to find at least some of the true skin region of any detected person, no matter what their skin color is or even if their skin appears somewhat blueish or reddish in the camera image.

Unfortunately, face detection using cascade classifiers is quite slow on current embedded devices, so that method might be less ideal for some real-time embedded applications. On the other hand, we can take advantage of the fact that for mobile apps and some embedded systems, it can be expected that the user will be facing the camera directly from a very close distance, so it can be reasonable to ask the user to place their face at a specific location and distance, rather than try to detect the location and size of their face. This is the basis of many mobile phone apps, where the app asks the user to place their face at a certain position or perhaps to manually drag points on the screen to show where the corners of their face are in a photo. So, let's simply draw the outline of a face in the center of the screen, and ask the user to move their face to the position and size shown.

Showing the user where to put their face

When the alien mode is first started, we will draw the face outline on top of the camera frame so the user knows where to put their face. We will draw a big ellipse covering 70% of the image height, with a fixed aspect ratio of 0.72, so that the face will not become too skinny or fat depending on the aspect ratio of the camera:

// Draw the color face onto a black background.
Mat faceOutline = Mat::zeros(size, CV_8UC3);
Scalar color = CV_RGB(255,255,0); // Yellow.
auto thickness = 4;

// Use 70% of the screen height as the face height.
auto sw = size.width;
auto sh = size.height;
int faceH = sh/2 * 70/100; // "faceH" is radius of the ellipse.

// Scale the width to be the same nice shape for any screen width.
int faceW = faceH * 72/100;
// Draw the face outline.
ellipse(faceOutline, Point(sw/2, sh/2), Size(faceW, faceH),
0, 0, 360, color, thickness, CV_AA);

To make it more obvious that it is a face, let's also draw two eye outlines. Rather than drawing an eye as an ellipse, we can give it a bit more realism (refer to the following image) by drawing a truncated ellipse for the top of the eye and a truncated ellipse for the bottom of the eye, because we can specify the start and end angles when drawing with the ellipse() function:

// Draw the eye outlines, as 2 arcs per eye.
int eyeW = faceW * 23/100;
int eyeH = faceH * 11/100;
int eyeX = faceW * 48/100;
int eyeY = faceH * 13/100;
Size eyeSize = Size(eyeW, eyeH);

// Set the angle and shift for the eye half ellipses.
auto eyeA = 15; // angle in degrees.
auto eyeYshift = 11;

// Draw the top of the right eye.
ellipse(faceOutline, Point(sw/2 - eyeX, sh/2 -eyeY),
eyeSize, 0, 180+eyeA, 360-eyeA, color, thickness, CV_AA);

// Draw the bottom of the right eye.
ellipse(faceOutline, Point(sw/2 - eyeX, sh/2 - eyeY-eyeYshift),
eyeSize, 0, 0+eyeA, 180-eyeA, color, thickness, CV_AA);

// Draw the top of the left eye.
ellipse(faceOutline, Point(sw/2 + eyeX, sh/2 - eyeY),
eyeSize, 0, 180+eyeA, 360-eyeA, color, thickness, CV_AA);

// Draw the bottom of the left eye.
ellipse(faceOutline, Point(sw/2 + eyeX, sh/2 - eyeY-eyeYshift),
eyeSize, 0, 0+eyeA, 180-eyeA, color, thickness, CV_AA);

We can do the same to draw the bottom lip of the mouth:

// Draw the bottom lip of the mouth.
int mouthY = faceH * 48/100;
int mouthW = faceW * 45/100;
int mouthH = faceH * 6/100;
ellipse(faceOutline, Point(sw/2, sh/2 + mouthY), Size(mouthW,
mouthH), 0, 0, 180, color, thickness, CV_AA);

To make it even more obvious that the user should put their face where shown, let's write a message on the screen!

// Draw anti-aliased text.
float fontScale = 1.0f;
int fontThickness = 2;
char *szMsg = "Put your face here";
putText(faceOutline, szMsg, Point(sw * 23/100, sh * 10/100),
fontFace, fontScale, color, fontThickness, CV_AA);

Now that we have the face outline drawn, we can overlay it onto the displayed image by using alpha blending to combine the cartoonified image with this drawn outline:

addWeighted(dst, 1.0, faceOutline, 0.7, 0, dst, CV_8UC3);

This results in the outline in the following image, showing the user where to put their face, so we don't have to detect the face location: