"If your pictures aren't good enough, you're not close enough."
Like a computer vision program, a photographer is the intelligence behind the lens. (Some would say the photographer is the soul behind the lens.) A good photographer continuously performs detection and tracking tasks—scanning the environment, choosing the subject, predicting actions and expressions that will create the right moment for the photo, and choosing the lens, settings, and viewpoint that will most effectively frame the subject.
By getting "close enough" to the subject and the action, the photographer can observe details quickly with the naked eye and can move to other viewpoints quickly because the distances are short and because the equipment is typically light (compared to a long lens on a tripod for a distant shot). Moreover, a close-up, wide-angle shot pulls the photographer, and viewer, into a first-person perspective of events, as if we become the subject or the subject's comrade for a single moment.
Photographic aesthetics concern us further in Chapter 2, Photographing Nature and Wildlife with an Automated Camera. For now, let's just establish two cardinal rules: don't miss the subject and don't miss the moment! Poor visibility and unfortunate timing are the worst excuses a photographer or a practitioner of computer vision can give. To hold ourselves to account, let us define some measurements that are relevant to these cardinal rules.
Resolution is the finest level of detail that the lens and camera can see. For many computer vision applications, recognizable details are the subject of the work, and if the system's resolution is poor, we might miss this subject completely. Resolution is often expressed in terms of the sensor's photosite counts or the captured image's pixel counts, but at best these measurements only tell us about one limiting factor. A better, empirical measurement, which reflects all characteristics of the lens, sensor, and setup, is called line pairs per millimeter (lp/mm). This means the maximum density of black-on-white lines that the lens and camera can resolve, in a given setup. At any higher density than this, the lines in the captured image blur together. Note that lp/mm varies with the subject's distance and the lens's settings, including the focal length (optical zoom) of a zoom lens. When you approach the subject, zoom in, or swap out a short lens for a long lens, the system should of course capture more detail! However, lp/mm does not vary when you crop (digitally zoom) a captured image.
Lighting conditions and the camera's ISO speed setting also have an effect on lp/mm. High ISO speeds are used in low light and they boost both the signal (which is weak in low light) and the noise (which is as strong as ever). Thus, at high ISO speeds, some details are blotted out by the boosted noise.
To achieve anything near its potential resolution, the lens must be properly focused. Dante Stella, a contemporary photographer, describes a problem with modern camera technology:
"For starters, it lacks … thought-controlled predictive autofocus."
That is to say, autofocus can fail miserably when its algorithm is mismatched to a particular, intelligent use or a particular pattern of evolving conditions in the scene. Long lenses are especially unforgiving with respect to improper focus. The depth of field (the distance between the nearest and farthest points in focus) is shallower in longer lenses. For some computer vision setups—for example, a camera hanging over an assembly line—the distance to the subject is highly predictable and in such cases manual focus is an acceptable solution.
Field of view (FOV) is the extent of the lens's vision. Typically, FOV is measured as an angle, but it can be measured as the distance between two peripherally observable points at a given depth from the lens. For example, a FOV of 90 degrees may also be expressed as a FOV of 2m at a depth of 1m or a FOV of 4m at a depth of 2m. Where not otherwise specified, FOV usually means diagonal FOV (the diagonal of the lens's vision), as opposed to horizontal FOV or vertical FOV. A longer lens has a narrower FOV. Typically, a longer lens also has higher resolution and less distortion. If our subject falls outside the FOV, we miss the subject completely! Toward the edges of the FOV, resolution tends to decrease and distortion tends to increase, so preferably the FOV should be wide enough to leave a margin around the subject.
The camera's throughput is the rate at which it captures image data. For many computer vision applications, a visual event might start and end in a fleeting moment and if the throughput is low, we might miss the moment completely or our image of it might suffer from motion blur. Typically, throughput is measured in frames per second (FPS), though measuring it as a bitrate can be useful, too. Throughput is limited by the following factors:
Shutter speed (exposure time): For a well-exposed image, the shutter speed is limited by lighting conditions, the lens's aperture setting, and the camera's ISO speed setting. (Conversely, a slower shutter speed allows for a narrower aperture setting or slower ISO speed.) We will discuss aperture settings after this list.
The type of shutter: A global shutter synchronizes the capture across all photosites. A rolling shutter does not; rather, the capture is sequential such that photosites at the bottom of the sensor register their signals later than photosites at the top. A rolling shutter is inferior because it can make an object appear skewed when the object or the camera moves rapidly. (This is sometimes called the "Jell-O effect" because of the video's resemblance to a wobbling mound of gelatin.) Also, under rapidly flickering lighting, a rolling shutter creates light and dark bands in the image. If the start of the capture is synchronized but the end is not, the shutter is referred to as a rolling shutter with global reset.
The camera's onboard image processing routines, such as conversion of raw photosite signals to a given number of pixels in a given format. As the number of pixels and bytes per pixel increase, the throughput decreases.
The interface between the camera and host computer: Common camera interfaces, in order of decreasing bit rates, include CoaXPress full, Camera Link full, USB 3.0, CoaXPress base, Camera Link base, Gigabit Ethernet, IEEE 1394b (FireWire full), USB 2.0, and IEEE 1394 (FireWire base).
A wide aperture setting lets in more light to allow for a faster exposure, a lower ISO speed, or a brighter image. However, a narrower aperture has the advantage of offering a greater depth of field. A lens supports a limited range of aperture settings. Depending on the lens, some aperture settings exhibit higher resolution than others. Long lenses tend to exhibit more stable resolution across aperture settings.
A lens's aperture size is expressed as an f-number or f-stop, which is the ratio of the lens's focal length to the diameter of its aperture. Roughly speaking, focal length is related to the length of the lens. More precisely, it is the distance between the camera's sensor and the lens system's optical center when the lens is focused at an infinitely distant target. The focal length should not be confused with the focus distance—the distance to objects that are in focus. The following diagram illustrates the meanings of focal length and focal distance as well as FOV:
With a higher f-number (a proportionally narrower aperture), a lens transmits a smaller proportion of incoming light. Specifically, the intensity of the transmitted light is inversely proportional to the square of the f-number. For example, when comparing the maximum apertures of two lenses, a photographer might write, "The f/2 lens is twice as fast as the f/2.8 lens." This means that the former lens can transmit twice the intensity of light, allowing an equivalent exposure to be taken in half the time.
A lens's efficiency or transmittance (the proportion of light transmitted) depends on not only the f-number but also non-ideal factors. For example, some light is reflected off the lens elements instead of being transmitted. The T-number or T-stop is an adjustment to the f-number based on empirical findings about a given lens's transmittance. For example, regardless of its f-number, a T/2.4 lens has the same transmittance as an ideal f/2.4 lens. For cinema lenses, manufacturers often provide T-number specifications but for other lenses, it is much more common to see only f-number specifications.
The sensor's efficiency is the proportion of the lens-transmitted light that reaches photosites and gets converted to a signal. If the efficiency is poor, the sensor misses much of the light! A more efficient sensor will tend to take well-exposed images for a broader range of camera settings, lens settings, and lighting conditions. Thus, efficiency gives the system more freedom to auto-select settings that are optimal for resolution and throughput. For the common type of sensor described in the previous section, Coloring the light, the choice of color filters has a big effect on efficiency. A camera designed to capture visible light in grayscale has high efficiency because it can receive all visible wavelengths at each photosite. A camera designed to capture visible light in multiple color channels typically has much lower efficiency because some wavelengths are filtered out at each photosite. A camera designed to capture NIR alone, by filtering out all visible light, typically has even lower efficiency.
Efficiency is a good indication of the system's ability to form some kind of image under diverse lighting (or radiation) conditions. However, depending on the subject and the real lighting, a relatively inefficient system could have higher contrast and better resolution. The advantages of selectively filtering wavelengths are not necessarily reflected in lp/mm, which measures black-on-white resolution.
By now, we have seen many quantifiable tradeoffs that complicate our efforts to capture a subject in a moment. As Robert Capa's advice implies, getting close with a short lens is a relatively robust recipe. It allows for good resolution with minimal risk of completely missing the framing or the focus. On the other hand, such a setup suffers from high distortion and, by definition, a short working distance. Moving beyond the capabilities of cameras in Capa's day, we have also considered the features and configurations that allow for high-throughput and high-efficiency video capture.
Having primed ourselves on wavelengths, image formats, cameras, lenses, capture settings, and photographers' common sense, let us now select several systems to study.