Book Image

Learning OpenCV 3 Application Development

By : Samyak Datta
Book Image

Learning OpenCV 3 Application Development

By: Samyak Datta

Overview of this book

Computer vision and machine learning concepts are frequently used in practical computer vision based projects. If you’re a novice, this book provides the steps to build and deploy an end-to-end application in the domain of computer vision using OpenCV/C++. At the outset, we explain how to install OpenCV and demonstrate how to run some simple programs. You will start with images (the building blocks of image processing applications), and see how they are stored and processed by OpenCV. You’ll get comfortable with OpenCV-specific jargon (Mat Point, Scalar, and more), and get to know how to traverse images and perform basic pixel-wise operations. Building upon this, we introduce slightly more advanced image processing concepts such as filtering, thresholding, and edge detection. In the latter parts, the book touches upon more complex and ubiquitous concepts such as face detection (using Haar cascade classifiers), interest point detection algorithms, and feature descriptors. You will now begin to appreciate the true power of the library in how it reduces mathematically non-trivial algorithms to a single line of code! The concluding sections touch upon OpenCV’s Machine Learning module. You will witness not only how OpenCV helps you pre-process and extract features from images that are relevant to the problems you are trying to solve, but also how to use Machine Learning algorithms that work on these features to make intelligent predictions from visual data!
Table of Contents (16 chapters)
Learning OpenCV 3 Application Development
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Traversing Mat objects


So far, you have learnt in detail about the Mat class, what it represents, how to initialize instances of the Mat class, and the different ways to create Mat objects. Along the way, we have also looked at some other OpenCV classes, such as Size, Scalar, and Rect. We have also successfully run our very first OpenCV Hello World program. Our sample program was fairly simplistic. It read an image from disk and loaded the contents into a Mat object. The real fun begins after this. In any application that you develop, you would typically be reading an image or images from a storage disk into your code and then apply image processing or computer vision algorithms to them. In this section, we will take our first steps towards starting with the processing aspect of things.

As we stated at the outset, an image is the sum total of its pixels. So, to understand any sort of processing that gets applied to images, we need to know how the pixel values would be modified as a result of the operations. This gives rise to the necessity of iterating over each and every pixel of a digital image. Now, since images are synonymous with Mat objects within the realm of OpenCV, we need a mechanism that allows us to iterate over all the values stored in the data matrix of a Mat. This section will discuss some techniques to do the same. We will present a couple of different ways to achieve such a traversal along with the pros and cons of using each approach. Once again, you will come to appreciate the utility of the Mat class when you encounter some more Mat member functions that have been made available to aid the programmer with this task.

Continuity of the Mat data matrix

Before we start with the code for traversing Mat objects, we need to understand how (more precisely, in what order) the data matrix stores the pixel values in memory. To do that, we need to introduce the concept of continuity. A data matrix is said to be continuous if all its rows are stored at adjacent memory locations without any gap between the contents of two successive rows. If a matrix is not continuous, it is said to be non-continuous. Now, why do we care if our Mat object's underlying data matrix is continuous or not? Well, as it turns out that iterating a continuous data matrix is much faster than going over a non-continuous one because the former requires a smaller number of memory accesses. Having learnt about the benefits offered by the continuity property of a Mat object's data matrix, how do we take advantage of this feature in our applications? The answer to that can be found in the following code snippet:

int channels = image.channels(); 
int num_rows = image.rows; 
int num_cols = (image.cols * channels); 
 
if (image.isContinuous()) { 
    num_cols = num_cols * num_rows; 
    num_rows = 1; 
} 

This piece of code achieves what I like to call flattening of the data matrix, and this is typically performed as a precursor to the actual image traversal. If the rows of the data matrix are indeed saved in contiguous memory locations, this means that we can treat the entire matrix as a single one-dimensional array. This array will have one row and the number of columns will be equal to (numRows*numCols*numChannels), which is the total number of pixels in the image. The code snippet assumes that the image is an 8-bit Mat object. Also note that the flattening is performed only if the image is continuous.

In the case of non-continuous images, the value of numRows and numCols remain as they are read from the Mat object.

Matrices created by imread(), clone(), or a constructor will always be continuous. In fact, the only time a matrix will not be continuous is when it borrows data from an existing matrix. By borrowing data, I mean when a new matrix is created out of an ROI of a bigger matrix, for example:

Mat big (200, 300, CV_8UC1); 
Mat roi (big, Rect(10, 10, 100, 100)); 
Mat col = big.col(0); 

Both matrices, roi and col, will be non-continuous as they borrow data from big.

Image traversals

Now, we are ready for the actual traversal. As stated earlier, we will discuss a couple of different ways to go about this. The first technique uses the ptr() method of the Mat class. According to the documentation of the Mat::ptr() method, it returns a pointer to the specified matrix row. We specify the row by its 0 based index passed to the function as an argument. So, let's check out the Mat::ptr() method in action:

for (int i = 0; i < numRows; ++i) { 
    uchar* row_ptr = image.ptr<uchar>(i); 
    for (int j = 0; j < numCols; ++j) { 
        // row_ptr[j] will give you access to the pixel value 
        // any sort of computation/transformation is to be performed here 
    } 
} 

What this technique essentially does is acquire the pointer to the start of each row with the statement image.ptr<uchar>(i) and save it in a pointer variable named row_ptr (the outer for loop); loop variable i is used to index the rows of the matrix. Once we have acquired the pointer to an image row, we iterate through the row to access the value of each and every pixel. This is precisely what the inner for loop, which has the j loop variable, accomplishes. What is elegant about this code is that it works in both cases, whether our data matrix is continuous (and flattened) or not. Just think about it; if our matrix were continuous and had been flattened using the code that we discussed a while back, then it would have had a single row (numRows=1) and the number of columns would have been the same as the number of pixels in the image, . This would mean that the outer loop runs only once and we call the Mat::ptr() method once to fetch all the pixels of the image in a single call. And if our matrix hasn't been flattened, then image.ptr<uchar>(i) will be called for each row that makes it a total of numRowstimes. This is also the reason that flattening a matrix is more efficient in terms of time taken.

Let's put together the code for the flattening and traversal of the image to get a complete picture of using the pointer method for Mat object traversal:

void scanImage(Mat& image) { 
    int channels = image.channels(); 
    int num_rows = image.rows; 
    int num_cols = (image.cols * channels); 
 
    if (image.isContinuous()) { 
        num_cols *= num_rows; 
        num_rows = 1; 
} 
 
for (int i = 0; i < num_rows; ++i) { 
    uchar* row_ptr = image.ptr<uchar>(i); 
    for (int j = 0; j < num_cols; ++j) { 
        // Perform operations on pixel value row_ptr[j] 
    } 
} 
} 

So, in summary, the Mat::ptr() method essentially works by fetching the data one row at a time. In that sense, the access method here is sequential: when the data of one of the rows is fetched, we can go over the contents of only that particular row. Accessing a new row necessitates a new fetch call. Flattening the data matrix is just a way to speed up computation, which works by bringing in all the data in a single fetch. This might not be the most aesthetic way of doing things. Your code may sometimes be difficult to understand and/or debug, especially when it comes to handling multi-channel images (you need to know exactly how many columns to skip per pixel while traversing a row). Now, this is where our second approach comes in.

This method relies on the Mat::at() method. As per the OpenCV documentation, the at() method returns a reference to any specified array element. The pixel whose value we are interested in is specified via the row and column index. This approach provides us with a random access to the data matrix. Let's look at an example code in action that uses the at() method to access pixel values. In the following code snippet, assume that I is a single-channel, grayscale image:

for( int i = 0; i < I.rows; ++i) { 
    for( int j = 0; j < I.cols; ++j) { 
        // Matrix elements can be accessed via : I.at<uchar>(i,j) 
    } 
} 

The code looks much simpler, more compact, and easier to read than the earlier approach. We have a couple of for loops: the outer loop (with index variable i) which iterates over the rows and the inner loop (with index variable j) that goes over the columns. As we move over each pixel, we can access its value by calling I.at<uchar>(i,j).

But what about the case when our image is multi-channeled? Let's say that we have a three-channel RGB image that we need to traverse. The code would have a very similar structure but with minor differences. Since our image is now three-channeled, the  uchar data type will not be appropriate for the pixel values. The solution is presented in the following code snippet:

for( int i = 0; i < I.rows; ++i) { 
    for( int j = 0; j < I.cols; ++j) { 
        /**  
        * The B, G and R components for the (i, j)-th pixel can be accessed by: 
        * I.at<Vec3b>(i, j)[0] 
        * I.at<Vec3b>(i, j)[1] 
        * I.at<Vec3b>(i, j)[2] 
        **/ 
    } 
} 

The first thing you notice about the code is the use of what seems like a new OpenCV type named Vec_3b. All you need to know about Vec_3b at this point is that it stands for a vector of three byte values, that is, a vector of three numbers between 0 and 255 (inclusive). And that seems to be the perfect data type for representing what a pixel stands for in a three-channel RGB image (OpenCV always has the right tools made available to its users!). Now that we have established that the type of each value in the data matrix is Vec_3b, which means that the at() method returns a reference to Vec_3b, we can access the individual elements within Vec_3b using the [] operator, just like a C++ array or vector. Now, recall that when we discussed about image channels, we said that the OpenCV stores the R, G, and B components in the reverse order. This would mean that the zeroth, first and second elements of Vec_3b would each refer to the blue, green, and red components of the pixel, respectively. You should be extra careful about this fact as it can be a potential source of errors in your code.

Now, the library has gone a step further to provide another level of convenience for its users. Using the previously mentioned approach, we have to write the name of the data type Vec_3b every time we want to access the value for a particular channel of a particular pixel. In order to avoid that, OpenCV provides us with a template class named Mat_. As always, we demonstrate its use via an example code snippet:

Mat_<Vec3b> _I = I; 
for( int i = 0; i < I.rows; ++i) { 
    for( int j = 0; j < I.cols; ++j ) { 
        /**  
        * The B, G and R components for the (i, j)-th pixel can be accessed by: 
        * _I(i, j)[0] 
        * _I(i, j)[1] 
        * _I(i, j)[2] 
        **/ 
    } 
} 

The first thing we do is declare an object of the Mat_ class and initialize it with our original Mat object. Mat_ is a thin template wrapper over the Mat class. It doesn't contain any extra data fields in addition to what is available with the Mat object. In fact, references to the two classes (Mat and Mat_) can be converted to each other. The only advantage Mat_ offers is the notational convenience of having to skip writing the data type every time we have to access a pixel (this is because the data type has been specified during declaration of the Mat_ object itself).

As stated earlier, the Mat::at() method is suited for random access (it requires both the row and column index), the code is much more readable and clean, but it is slower than the pointer-based approach because the at() method does some range checks each time it is called.

We combine both the code snippets for single as well as multi-channel traversal using Mat::at() and encapsulate that within a single C++ function:

void scanImage(Mat& image) { 
  int channels = image.channels(); 
   
  if (channels == 1) { 
    for( int i = 0; i < I.rows; ++i) { 
      for( int j = 0; j < I.cols; ++j) { 
        // Matrix elements can be accessed via : I.at<uchar>(i,j) 
      } 
    } 
  }  
  else if (channels == 3)  { 
    for( int i = 0; i < I.rows; ++i) { 
      for( int j = 0; j < I.cols; ++j) {  
        // The B, G and R components for the (i, j)-th pixel can be  
        // accessed by: 
        //   I.at<Vec3b>(i, j)[0] 
        //   I.at<Vec3b>(i, j)[1] 
        //   I.at<Vec3b>(i, j)[2] 
      } 
    } 
  } 
} 

This concludes our section on image traversals. But before we move on to the next topic, a few final words on Mat object traversals. We have gone over a lot of different methods to achieve what seems like a very basic task. We have seen the sequential-pointer approach and the random-access technique using the Mat::at() method. Personally, I tend to lean towards the latter due to its aesthetic appeal and a clear distinction between single and multichannel images that leaves no room for confusion. It's usually also safer, due to the range checks that we've mentioned before, and it's also easier to access the surrounding pixels if you need them for processing (something that we would be doing quite a lot from Chapter 2, Image Filtering.

Most of the example programs in the remainder of this book will stick to this too. However, you are encouraged to try out the former approach too, if and when you feel like, and compare the results with the ones shown in the text.

Now, we have been traversing Mat objects and images for quite some time now but haven't really been doing any sort of tangible processing with them. You will have noticed that when it came to the section of code where we had the chance to actually access and/or modify the pixel values, we stopped and hid behind those boring comment blocks that did nothing but tell us more theory about how to code. Very soon, in the next few sections, we are going to remove those comments and fill up that space with some actual code that performs some simple, yet cool transformations on our images!