Claude Monet, one of the founders of French Impressionist painting, taught his students to paint only what they saw, not what they knew. He even went as far as to say:
"I wish I had been born blind and then suddenly gained my sight so that I could begin to paint without knowing what the objects were that I could see before me."
Monet rejected traditional artistic subjects, which tended to be mystical, heroic, militaristic, or revolutionary. Instead, he relied on his own observations of middle-class life: of social excursions; of sunny gardens, lily ponds, rivers, and the seaside; of foggy boulevards and train stations; and of private loss. With deep sadness, he told his friend, Georges Clemenceau (the future President of France):
"I one day found myself looking at my beloved wife's dead face and just systematically noting the colors according to an automatic reflex!"
Monet painted everything according to his personal impressions. Late in life, he even painted the symptoms of his own deteriorating eyesight. He adopted a reddish palette while he suffered from cataracts and a brilliant bluish palette after cataract surgery left his eyes more sensitive, possibly to the near ultraviolet range.
Like Monet's students, we as scholars of computer vision must confront a distinction between seeing and knowing and likewise between input and processing. Light, a lens, a camera, and a digital imaging pipeline can grant a computer a sense of sight. From the resulting image data, machine-learning (ML) algorithms can extract knowledge or at least a set of meta-senses such as detection, recognition, and reconstruction (scanning). Without proper senses or data, a system's learning potential is limited, perhaps even nil. Thus, when designing any computer vision system, we must consider the foundational requirements in terms of lighting conditions, lenses, cameras, and imaging pipelines.
What do we require in order to clearly see a given subject? This is the central question of our first chapter. Along the way, we will address five subquestions:
What do we require to see fast motion or fast changes in light?
What do we require to see distant objects?
What do we require to see with depth perception?
What do we require to see in the dark?
How do we obtain good value-for-money when purchasing lenses and cameras?
For many practical applications of computer vision, the environment is not a well-lit, white room, and the subject is not a human face at a distance of 0.6m (2')!
The choice of hardware is crucial to these problems. Different cameras and lenses are optimized for different imaging scenarios. However, software can also make or break a solution. On the software side, we will focus on the efficient use of OpenCV. Fortunately, OpenCV's videoio module supports many classes of camera systems, including the following:
Webcams in Windows, Mac, and Linux via the following frameworks, which come standard with most versions of the operating system:
Windows: Microsoft Media Foundation (MSMF), DirectShow, or Video for Windows (VfW)
Linux: Video4Linux (V4L), Video4Linux2 (V4L2), or libv4l
Built-in cameras in iOS and Android devices
OpenNI-compliant depth cameras via OpenNI or OpenNI2, which are open-source under the Apache license
Other depth cameras via the proprietary Intel Perceptual Computing SDK
Photo cameras via libgphoto2, which is open source under the GPL license. For a list of libgphoto2's supported cameras, see http://gphoto.org/proj/libgphoto2/support.php.
IIDC/DCAM-compliant industrial cameras via libdc1394, which is open-source under the LGPLv2 license
For Linux, unicap can be used as an alternative interface for IIDC/DCAM-compliant cameras, but unicap is GPL-licensed and thus not appropriate for use in closed-source software.
Other industrial cameras via the following proprietary frameworks:
Allied Vision Technologies (AVT) PvAPI for GigE Vision cameras
Smartek Vision Giganetix SDK for GigE Vision cameras
The videoio module is new in OpenCV 3. Previously, in OpenCV 2, video capture and recording were part of the highgui module, but in OpenCV 3, the highgui module is only responsible for GUI functionality. For a complete index of OpenCV's modules, see the official documentation at http://docs.opencv.org/3.0.0/.
However, we are not limited to the features of the videoio module; we can use other APIs to configure cameras and capture images. If an API can capture an array of image data, OpenCV can readily use the data, often without any copy operation or conversion. As an example, we will capture and use images from depth cameras via OpenNI2 (without the videoio module) and from industrial cameras via the FlyCapture SDK by Point Grey Research (PGR).
An industrial camera or machine vision camera typically has interchangeable lenses, a high-speed hardware interface (such as FireWire, Gigabit Ethernet, USB 3.0, or Camera Link), and a complete programming interface for all camera settings.
Most industrial cameras have SDKs for Windows and Linux. PGR's FlyCapture SDK supports single-camera and multi-camera setups on Windows as well as single-camera setups on Linux. Some of PGR's competitors, such as Allied Vision Technologies (AVT), offer better support for multi-camera setups on Linux.
We will learn about the differences among categories of cameras, and we will test the capabilities of several specific lenses, cameras, and configurations. By the end of the chapter, you will be better qualified to design either consumer-grade or industrial-grade vision systems for yourself, your lab, your company, or your clients. I hope to surprise you with the results that are possible at each price point!