Book Image

Learning Geospatial Analysis with Python

By : Joel Lawhead
Book Image

Learning Geospatial Analysis with Python

By: Joel Lawhead

Overview of this book

Geospatial Analysis is used in almost every field you can think of from medicine, to defense, to farming. This book will guide you gently into this exciting and complex field. It walks you through the building blocks of geospatial analysis and how to apply them to influence decision making using the latest Python software. Learning Geospatial Analysis with Python, 2nd Edition uses the expressive and powerful Python 3 programming language to guide you through geographic information systems, remote sensing, topography, and more, while providing a framework for you to approach geospatial analysis effectively, but on your own terms. We start by giving you a little background on the field, and a survey of the techniques and technology used. We then split the field into its component specialty areas: GIS, remote sensing, elevation data, advanced modeling, and real-time data. This book will teach you everything you need to know about, Geospatial Analysis from using a particular software package or API to using generic algorithms that can be applied. This book focuses on pure Python whenever possible to minimize compiling platform-dependent binaries, so that you don’t become bogged down in just getting ready to do analysis. This book will round out your technical library through handy recipes that will give you a good understanding of a field that supplements many a modern day human endeavors.
Table of Contents (17 chapters)
Learning Geospatial Analysis with Python Second Edition
About the Author
About the Reviewers

Common vector GIS concepts

This section will discuss the different types of GIS processes commonly used in geospatial analysis. This list is not exhaustive; however, it provides you with the essential operations that all other operations are based on. If you understand these operations, you can quickly understand much more complex processes as they are either derivatives or combinations of these processes.

Data structures

GIS vector data uses coordinates consisting of, at a minimum, an x horizontal value and a y vertical value to represent a location on the Earth. In many cases, a point may also contain a z value. Other ancillary values are possible including measurements or timestamps.

These coordinates are used to form points, lines, and polygons to model real-world objects. Points can be a geometric feature in and of themselves or they can connect line segments. Closed areas created by line segments are considered polygons. Polygons model objects such as buildings, terrain, or political boundaries.

A GIS feature can consist of a single point, line, or polygon or it can consist of more than one shape. For example, in a GIS polygon dataset containing world country boundaries, the Philippines, which is made up of 7,107 islands, would be represented as a single country made up of thousands of polygons.

Vector data typically represents topographic features better than raster data. Vector data has better accuracy potential and is more precise. However, to collect vector data on a large scale is also traditionally more costly than raster data.

Two other important terms related to vector data structures are bounding box and convex hull. The bounding box or minimum bounding box is the smallest possible square that contains all of the points in a dataset. The following image demonstrates a bounding box for a collection of points:

The convex hull of a dataset is similar to the bounding box, but instead of a square, it is the smallest possible polygon that can contain a dataset. The bounding box of a dataset always contains its convex hull. The following image shows the same point data as the previous example with the convex hull polygon shown in red:


A buffer operation can be applied to spatial objects including points, lines, or polygons. This operation creates a polygon around the object at a specified distance. Buffer operations are used for proximity analysis, for example, establishing a safety zone around a dangerous area. In the following image, the black shapes represent the original geometry while the red outlines represent the larger buffer polygons generated from the original shape:


A dissolve operation creates a single polygon out of adjacent polygons. A common use for a dissolve operation is to merge two adjacent properties in a tax database that have been purchased by a single owner. Dissolves are also used to simplify data extracted from remote sensing:


Objects that have more points than necessary for the geospatial model can be generalized to reduce the number of points used to represent the shape. This operation usually requires a few attempts to get the optimal number of points without compromising the overall shape. It is a data optimization technique to simplify data for the efficiency of computing or better visualization. This technique is useful in web-mapping applications. Computer screens have a resolution of 72 dots per inch (dpi). Highly-detailed point data, which would not be visible, can be reduced so that less bandwidth is used to send a visually equivalent map to the user:


An intersection operation is used to see if one part of a feature intersects with one or more features. This operation is for spatial queries in proximity analysis and is often a follow-on operation to a buffer analysis:


A merge operation combines two or more non-overlapping shapes in a single multishape object. Multishape objects mean that the shapes maintain separate geometries but are treated as a single feature with a single set of attributes by the GIS:

Point in polygon

A fundamental geospatial operation is checking to see whether a point is inside a polygon. This one operation is the atomic building block of many different types of spatial queries. If the point is on the boundary of the polygon, it is considered inside. Very few spatial queries exist that do not rely on this calculation in some way. However, it can be very slow on a large number of points.

The most common and efficient algorithm to detect if a point is inside a polygon is called the ray casting algorithm. First, a test is performed to see if the point is on the polygon boundary. Next, the algorithm draws a line from the point in question in a single direction. The program counts the number of times the line crosses the polygon boundary until it reaches the bounding box of the polygon. The bounding box is the smallest box that can be drawn around the entire polygon. If the number is odd, the point is inside. If the number of boundary intersections is even, the point is outside:


The union operation is less common but very useful to combine two or more overlapping polygons in a single shape. It is similar to dissolve, but in this case, the polygons are overlapping as opposed to being adjacent. Usually, this operation is used to clean up automatically-generated feature datasets from remote sensing operations:


A join or SQL join is a database operation used to combine two or more tables of information. Relational databases are designed to avoid storing redundant information for one-to-many relationships. For example, a U.S. state may contain many cities. Rather than creating a table for each state containing all of its cities, a table of states with numeric IDs is created, while a table for all the cities in every state is created with a state numeric ID. In a GIS, you can also have spatial joins that are part of the spatial extension software for a database. In spatial joins, combine the attributes to two features in the same way that you do in a SQL join, but the relation is based on the spatial proximity of the two features. To follow the previous cities example, we could add the county name that each city resides in using a spatial join. The cities layer could be loaded over a county polygon layer whose attributes contain the county name. The spatial join would determine which city is in which county and perform a SQL join to add the county name to each city's attribute row.

Geospatial rules about polygons

In geospatial analysis, there are several general rules of thumb regarding polygons that are different from mathematical descriptions of polygons:

  • Polygons must have at least four points—the first and last points must be the same

  • A polygon boundary should not overlap itself

  • Polygons in a layer shouldn't overlap

  • A polygon in a layer inside another polygon is considered as a hole in the underlying polygon

Different geospatial software packages and libraries handle exceptions to these rules differently and can lead to confusing errors or software behaviors. The safest route is to make sure that your polygons obey these rules. There is one more important piece of information about polygons. A polygon is by definition a closed shape, which means that the first and last vertices of the polygon are identical. Some geospatial software will throw an error if you haven't explicitly duplicated the first point as the last point in the polygon dataset. Other software will automatically close the polygon without complaining. The data format that you use to store your geospatial data may also dictate how polygons are defined. This issue is a gray area and so it didn't make the polygon rules, but knowing this quirk will come in handy someday when you run into an error that you can't explain easily.