Book Image

Python Geospatial Analysis Essentials

By : Erik Westra
Book Image

Python Geospatial Analysis Essentials

By: Erik Westra

Overview of this book

Table of Contents (13 chapters)
Python Geospatial Analysis Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Understanding geospatial data


Geospatial data is data that positions things on the Earth's surface. This is a deliberately vague definition that encompasses both the idea of location and shape. For example, a database of car accidents may include the latitude and longitude coordinates identifying where each accident occurred, and a file of county outlines would include both the position and shape of each county. Similarly, a GPS recording of a journey would include the position of the traveler over time, tracing out the path they took on their travels.

It is important to realize that geospatial data includes more than just the geospatial information itself. For example, the following outlines are not particularly useful by themselves:

Once you add appropriate metadata, however, these outlines make a lot more sense:

Geospatial data, therefore, includes both spatial information (locations and shapes) and non-spatial information (metadata) about each item being described.

Spatial information is usually represented as a series of coordinates, for example:

location = (-38.136734, 176.252300)
outline = ((-61.686,17.024),(-61.738,16.989),(-61.829,16.996) ...)

These numbers won't mean much to you directly, but once you plot these series of coordinates onto a map, the data suddenly becomes comprehensible:

There are two fundamental types of geospatial data:

  • Raster data: This is geospatial data that divides the world up into cells and associates values with each cell. This is very similar to the way that bitmapped images divide an image up into pixels and associate a color with each pixel; for example:

    The value of each cell might represent the color to use when drawing the raster data on a map—this is often done to provide a raster basemap on which other data is drawn—or it might represent other information such as elevation, moisture levels, or soil type.

  • Vector data: This is geospatial data that consists of a list of features. For example, a shapefile containing countries would have one feature for each country. For each feature, the geospatial dataset will have a geometry, which is the shape associated with that feature, and any number of attributes containing the metadata for that feature.

    A feature's geometry is just a geometric shape that is positioned on the surface of the earth. This geometric shape is made up of points, lines (sometimes referred to as LineStrings), and polygons, or some combination of these three fundamental types:

The typical raster data formats you might encounter include:

  • GeoTIFF files, which are basically just TIFF format image files with georeferencing information added to position the image accurately on the earth's surface.

  • USGS .dem files, which hold a Digital Elevation Model (DEM) in a simple ASCII data format.

  • .png, .bmp, and .jpeg format image files, with associated georeferencing files to position the images on the surface of the earth.

For vector-format data, you may typically encounter the following formats:

  • Shapefile: This is an extremely common file format used to store and share geospatial data.

  • WKT (Well-Known Text): This is a text-based format often used to convert geometries from one library or data source to another. This is also the format commonly used when retrieving features from a database.

  • WKB (Well-Known Binary): This is the binary equivalent of the WKT format, storing geometries as raw binary data rather than text.

  • GML (Geometry Markup Language): This is an industry-standard format based on XML, and is often used when communicating with web services.

  • KML (Keyhole Markup Language): This is another XML-based format popularized by Google.

  • GeoJSON: This is a version of JSON designed to store and transmit geometry data.

Because your analysis can only be as good as the data you are analyzing, obtaining and using good-quality geospatial data is critical. Indeed, one of the big challenges in performing geospatial analysis is to get the right data for the job. Fortunately, there are several websites which provide free good-quality geospatial data. But if you're looking for a more obscure set of data, you may have trouble finding it. Of course, you do always have the choice of creating your own data from scratch, though this is an extremely time-consuming process.

We will return to the topic of geospatial data in Chapter 2, Geospatial Data, where we will examine what makes good geospatial data and how to obtain it.