Book Image

Learning Geospatial Analysis with Python

By : Joel Lawhead
4 (1)
Book Image

Learning Geospatial Analysis with Python

4 (1)
By: Joel Lawhead

Overview of this book

Geospatial analysis is used in almost every field you can think of from medicine, to defense, to farming. It is an approach to use statistical analysis and other informational engineering to data which has a geographical or geospatial aspect. And this typically involves applications capable of geospatial display and processing to get a compiled and useful data. "Learning Geospatial Analysis with Python" uses the expressive and powerful Python programming language to guide you through geographic information systems, remote sensing, topography, and more. It explains how to use a framework in order to approach Geospatial analysis effectively, but on your own terms. "Learning Geospatial Analysis with Python" starts with a background of the field, a survey of the techniques and technology used, and then splits the field into its component speciality areas: GIS, remote sensing, elevation data, advanced modelling, and real-time data. This book will teach you everything there is to know, from using a particular software package or API to using generic algorithms that can be applied to Geospatial analysis. This book focuses on pure Python whenever possible to minimize compiling platform-dependent binaries, so that you don't become bogged down in just getting ready to do analysis. "Learning Geospatial Analysis with Python" will round out your technical library with handy recipes and a good understanding of a field that supplements many a modern day human endeavors.
Table of Contents (17 chapters)
Learning Geospatial Analysis with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

History of geospatial analysis


Geospatial analysis can be traced as far back as 15,000 years ago, to the Lascaux Cave in southwestern France. In that cave, paleolithic artists painted commonly hunted animals and what many experts believe are astronomical star maps for either religious ceremonies or potentially even migration patterns of prey. Though crude, these paintings demonstrate an ancient example of humans creating abstract models of the world around them and correlating spatial-temporal features to find relationships. The following image shows one of the paintings with an overlay illustrating the star maps:

Over the centuries the art of cartography and the science of land surveying developed, but it wasn't until the 1800s that significant advances in geographic analysis emerged. Deadly cholera outbreaks in Europe between 1830 and 1860 led geographers in Paris and London to use geographic analysis for epidemiological studies.

In 1832, Charles Picquet used different half-toned shades of gray to represent deaths per thousand citizens in the 48 districts of Paris, as part of a report on the cholera outbreak. In 1854, John Snow expanded on this method by tracking a cholera outbreak in London as it occurred. By placing a point on a map of the city each time a case was diagnosed, he was able to analyze the clustering of cholera cases. Snow traced the disease to a single water pump and prevented further cases. The map has three layers with streets, an X for each pump, and dots for each cholera outbreak:

A retired French engineer named Charles Minard produced some of the most sophisticated infographics ever drawn between 1850 and 1870. The term infographics is too generic to describe these drawings because they have strong geographic components. The quality and detail of these maps make them fantastic examples of geographic information analysis even by today's standards. Minard released his masterpiece Carte figurative des pertes successives en hommes de l'Armée Française dans la campagne de Russie 1812-1813, in 1869, depicting the decimation of Napoleon's army in the Russian campaign of 1812. The map shows the size and location of the army over time, along with prevailing weather conditions. The following graphic contains four different series of information on a single theme. It is a fantastic example of geographic analysis using pen and paper. The size of the army is represented by the widths of the brown and black swaths at a ratio of one millimeter for every 10,000 men. The numbers are also written along the swaths. The brown-colored path shows soldiers who entered Russia, while the black represents the ones who made it out. The map scale is shown on the center right as one "French league" (2.75 miles or 4.4 kilometers). The chart on the bottom runs from right to left and depicts the brutal freezing temperatures experienced by the soldiers on the return march home from Russia.

While far more mundane than a war campaign, Minard released another compelling map cataloguing the number of cattle sent to Paris from around France. Minard used pie charts of varying sizes in the regions of France to show each area's variety and volume of cattle shipped.

In the early 1900s, mass printing drove the development of the concept of map layers—a key feature of geospatial analysis. Cartographers drew different map elements (vegetation, roads, elevation contours) on plates of glass which could then be stacked and photographed for printing as a single image. If the cartographer made a mistake, only one plate of glass had to be changed instead of the entire map. Later the development of plastic sheets made it even easier to create, edit, and store maps in this manner. However, the layering concept for maps as a benefit to analysis would not come into play until the modern computer age.

Geographic Information Systems

Computer mapping evolved with the computer itself in the 1960s. But the origin of the term Geographic Information System (GIS) began with the Canadian Department of Forestry and Rural Development. Dr. Roger Tomlinson headed a team of 40 developers in an agreement with IBM to build the Canadian Geographic Information System (CGIS). The CGIS tracked the natural resources of Canada and allowed profiling of these features for further analysis. The CGIS stored each type of land cover as a different layer. The CGIS also stored data in a Canadian-specific coordinate system suitable for the entire country devised for optimal area calculations. While the technology used is primitive by today's standards, the system had phenomenal capability at that time. The CGIS included software features which seem quite modern: map projection switching, rubber sheeting of scanned images, map scale change, line smoothing and generalization to reduce the number of points in a feature, automatic gap closing for polygons, area measurement, dissolving and merging of polygons, geometric buffering, creation of new polygons, scanning, and digitizing of new features from reference data.

Tip

The National Film Board of Canada produced a 1967 documentary on the CGIS which can be seen at the following URL:

http://video.esri.com/watch/128/data-for-decision_comma_-1967-short-version

Tomlinson is often called "The Father of GIS". After launching the CGIS, he earned his doctorate from the University of London with his 1974 dissertation, entitled The application of electronic computing methods and techniques to the storage, compilation, and assessment of mapped data, which describes GIS and geospatial analysis. Tomlinson now runs his own global consulting firm, Tomlinson Associates Ltd., and remains an active participant in the industry. He is often found delivering the keynote address at geospatial conferences.

CGIS is the starting point of geospatial analysis as defined by this book. But this book would not have been written if not for the work of Howard Fisher and the Harvard Laboratory for Computer Graphics and Spatial Analysis, at the Harvard Graduate School of Design. His work on the SYMAP GIS software, which outputs maps to a line printer, started an era of development at the lab, which produced two other important packages and as a whole permanently defined the geospatial industry. GRID was a raster-based GIS system which used cells to represent geographic features instead of geometry. GRID was written by Carl Steinitz and David Sinton. The system later became IMGRID. Next came ODYSSEY. ODYSSEY was a team effort led by Nick Chrisman and David White. It was a system of programs which included many advanced geospatial data management features typical of modern geodatabase systems. Harvard attempted to commercialize these packages with limited success. However, their impact is still seen today. Virtually every existing commercial and open source package owes something to these code bases.

Tip

Howard Fisher produced a 1967 film using output from SYMAP to show the urban expansion of Lansing, Michigan from 1850 to 1965 by hand-coding decades of property information into the system. The analysis took months but would take only a few minutes to create now using modern tools and data. You can see the film at the following URL:

http://youtu.be/xj8DQ7IQ8_o

There are now dozens of graphical user interface geospatial desktop applications available today from companies including Esri, ERDAS, Intergraph, and ENVI to name a few. Esri is the oldest continuously operating GIS software company, which started in the late 1960s. In the open source realm, packages including Quantum GIS (QGIS) and GRASS are widely used. Beyond comprehensive desktop software packages, software libraries for building new software exist in the thousands.

Remote sensing

Remote sensing is the collection of information about an object without making physical contact with that object. In the context of geospatial analysis, the object is usually the Earth. Remote sensing also includes the processing of the collected information. The potential of geographic information systems is limited only by the available geographic data. The cost of land surveying, even using a modern GPS, to populate a GIS has always been resource intensive. The advent of remote sensing not only dramatically reduced that cost of geospatial analysis, but it took the field in entirely new directions. In addition to powerful reference data for GIS systems, remote sensing has made possible the automated and semi-automated generation of GIS data by extracting features from images and geographic data.

The eccentric French photographer Gaspard-Félix Tournachon, also known as Nadar, took the first aerial photograph in 1858 from a hot air balloon over Paris. The value of a true bird's eye view of the world was immediately apparent. As early as 1920, the books on aerial photo interpretation began to appear.

When America entered the cold war with the Soviet Union after World War II, aerial photography for monitoring military capability became prolific with the invention of the American U2 spy plane. The U2 spy plane could fly at 75,000 feet, putting it out of range of existing anti-aircraft weapons designed to reach only 50,000 feet. The American U2 flights over Russia ended when the Soviets finally shot down a U2 and captured the pilot.

But aerial photography had little impact on modern geospatial analysis. Planes could only capture small footprints of an area. Photographs were tacked to walls or examined on light tables but not in the context of other information. Though extremely useful, aerial photo interpretation was simply another visual perspective.

The game changer came on October 4, 1957, when the Soviet Union launched the Sputnik 1 satellite. The Soviets had scrapped a much more complex and sophisticated satellite prototype because of manufacturing difficulties. Once corrected, this prototype would later become Sputnik 3. They opted instead for a simple metal sphere with 4 antennae and a simple radio transmitter. Other countries including the United States were also working on satellites. The satellite initiatives were not entirely a secret. They were driven by scientific motives as part of the International Geophysical Year. Advancement in rocket technology made artificial satellites a natural evolution for earth science. However, in nearly every case each country's defense agency was also heavily involved. Like the Soviets, other countries were struggling with complex satellite designs packed with scientific instruments. The Soviets' decision to switch to the simplest possible device for the sole reason of launching a satellite before the Americans was effective. Sputnik was visible in the sky as it passed over and its radio pulse could be heard by amateur radio operators. Despite Sputnik's simplicity, it provided valuable scientific information which could be derived from its orbital mechanics and radio frequency physics.

The Sputnik program's biggest impact was on the American space program. America's chief adversary had gained a tremendous advantage in the race to space. The United States ultimately responded with the Apollo moon landings. But, before that, the US launched a program that would remain a national secret until 1995. The classified CORONA program resulted in the first pictures from space. The US and Soviet Union had signed an agreement to end spy plane flights but satellites were conspicuously absent from the negotiations. The following map shows the CORONA process. Dashed lines are satellite flight paths, longer white tubes are the satellite, the smaller white cones are the film canisters, and the black blobs are the control stations that triggered the ejection of the film so a plane could catch it in the sky.

The first CORONA satellite was a four year effort with many setbacks. But the program ultimately succeeded. The difficulty of satellite imaging even today is retrieving the images from space. The CORONA satellites used canisters of black and white film which were ejected from the vehicle once exposed. As the film canister parachuted to earth, a US military plane would catch the package in midair. If the plane missed the canister it would float for a brief duration in the water before sinking into the ocean to protect the sensitive information. The US continued to develop the CORONA satellites until they matched the resolution and photographic quality of the U2 spy plane photos. The primary disadvantages of the CORONA instruments were reusability and timeliness. Once out of film a satellite could no longer be of service. Also, the film recovery was on a set schedule making the system unsuitable to monitor real-time situations. The overall success of the CORONA program, however, paved the way for the next wave of satellites, which ushered in the modern era of remote sensing.

Because of the CORONA program's secret status, its impact on remote sensing was indirect. Photographs of the earth taken on manned US space missions inspired the idea of a civilian-operated remote sensing satellite. The benefits of such a satellite were clear but the idea was still controversial. Government officials questioned whether a satellite was as cost efficient as aerial photography. The military were worried the public satellite could endanger the secrecy of the CORONA program. And yet other officials worried about the political consequences of imaging other countries without permission. But the Department of the Interior finally won permission for NASA to create a satellite to monitor earth's surface resources.

On July 23, 1972, NASA launched the Earth Resources Technology Satellite (ERTS). The ERTS was quickly renamed to Landsat-1. The platform contained two sensors. The first was the Return Beam Vidicon (RBV) sensor, which was essentially a video camera. It was even built by the radio and television giant RCA. The RBV immediately had problems including disabling the satellite's altitude guidance system. The second attempt at a satellite was the highly experimental Multi-Spectral Scanner or MSS. The MSS performed flawlessly and produced superior results to the RBV. The MSS captured four separate images at four different wavelengths of the light reflected from the earth's surface.

This sensor had several revolutionary capabilities. The first and most important capability was the first global imaging of the planet scanning every spot on the earth every 16 days. The following image from the US National Aeronautics and Space Administration (NASA) illustrates this flight and collection pattern:

It also recorded light beyond the visible spectrum. While it did capture green and red light visible to the human eye, it also scanned near-infrared light at two different wavelengths not visible to the human eye. The images were stored and transmitted digitally to three different ground stations in Maryland, California, and Alaska. The multispectral capability and digital format meant the aerial view provided by Landsat wasn't just another photograph from the sky. It was beaming down data. This data could be processed by computers to output derivative information about the earth in the same way a GIS provided derivative information about the earth by analyzing one geographic feature in the context of another. NASA promoted the use of Landsat worldwide and made the data available at very affordable prices to anyone who asked.

This global imaging capability led to many scientific breakthroughs including the discovery of previously unknown geography as late as 1976. Using Landsat imagery the government of Canada located a tiny uncharted island inhabited by polar bears. They named the new landmass Landsat Island.

Landsat-1 was followed by six other missions and turned over to the National Oceanic and Atmospheric Administration (NOAA) as the responsible agency. Landsat-6 failed to achieve orbit due to a ruptured manifold, which disabled its maneuvering engines. During some of those missions the satellites were managed by the company EOSAT, now called Space Imaging, but returned to government management by the Landsat-7 mission. The following image from NASA is a sample of a Landsat 7 product:

The Landsat Data Continuity Mission (LDCM) launched February 13, 2013 and began collecting images on April 27, 2013 as part of its calibration cycle to become Landsat 8. The LDCM is a joint mission between NASA and the United States Geological Survey (USGS).

Elevation data

A Digital Elevation Model (DEM) is a three-dimensional representation of a planet's terrain. Within the context of this book that planet is Earth. The history of digital elevation models is far less complicated than remotely-sensed imagery but no less significant. Before computers, representations of elevation data were limited to topographic maps created through traditional land surveys. Technology existed to create 3D models from stereoscopic images or physical models from materials such as clay or wood, but these approaches were not widely used for geography.

The concept of digital elevation models began in 1986 when the French space agency, CNES, launched its SPOT-1 satellite which included a stereoscopic radar. This system created the first usable DEM. Several other US and European satellites followed this model with similar missions. In February 2000 the Space Shuttle Endeavour conducted the Shuttle Radar Topography Mission (SRTM), which collected elevation data over 80 percent of the earth's surface using a special radar antenna configuration that allowed a single pass. This model was surpassed in 2009 by the joint US and Japanese mission using the ASTER sensor aboard NASA's TERRA satellite. This system captured 99 percent of the earth's surface but has proven to have minor data issues. SRTM remains the gold standard. The following image from the US Geological Survey (USGS) shows a colorized DEM known as a hillshade. Greener areas are lower elevations while yellow and brown areas are mid-range to high elevations:

Recently more ambitious attempts at a worldwide elevation data set are underway in the form of TerraSAR-X and TanDEM-X satellites launched by Germany in 2007 and 2010, respectively. These two radar elevation satellites are working together to produce a global DEM, called WorldDEM, planned for release in 2014. This data set will have a relative accuracy of 2 meters and an absolute accuracy of 10 meters.

Computer-aided drafting

Computer-aided drafting (CAD) is worth mentioning, though it does not directly relate to geospatial analysis. The history of CAD system development parallels and intertwines with the history of geospatial analysis. CAD is an engineering tool used to model two- and three-dimensional objects usually for engineering and manufacturing. The primary difference between a geospatial model and a CAD model is a geospatial model is referenced to the earth, whereas a CAD model can possibly exist in abstract space. For example, a 3D blueprint of a building in a CAD system would not have a latitude or longitude. But in a GIS, the same building model would have a location on the earth. However, over the years CAD systems have taken on many features of GIS systems and are commonly used for smaller GIS projects. And likewise, many GIS programs can import CAD data which have been georeferenced. Traditionally, CAD tools were designed primarily for engineering data that were not geospatial.

However, engineers who became involved with geospatial engineering projects, such as designing a city utility electric system, would use the CAD tools they were familiar with to create maps. Over time both GIS software evolved to import the geospatial-oriented CAD data produced by engineers, and CAD tools evolved to better support geospatial data creation and better compatibility with GIS software. AutoCAD by AutoDesk and ArcGIS by Esri were the leading commercial packages to develop this capability and the GDAL OGR library developers added CAD support as well.