Book Image

Python Geospatial Analysis Essentials

By : Erik Westra
Book Image

Python Geospatial Analysis Essentials

By: Erik Westra

Overview of this book

Table of Contents (13 chapters)
Python Geospatial Analysis Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Unlocking the shapefile


At last, we are ready to start working with some geospatial data. Open up a command line or terminal window and cd into the TM_WORLD_BORDERS-0.3 directory you unzipped earlier. Then type python to fire up your Python interpreter.

We're going to start by loading the OGR library we installed earlier:

>>> import osgeo.ogr

We next want to open the shapefile using OGR:

>>> shapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp")

After executing this statement, the shapefile variable will hold an osgeo.ogr.Datasource object representing the geospatial data source we have opened. OGR data sources can support multiple layers of information, even though a shapefile has only a single layer. For this reason, we next need to extract the (one and only) layer from the shapefile:

>>>layer = shapefile.GetLayer(0)

Let's iterate through the various features within the shapefile, processing each feature in turn. We can do this using the following:

>>> for i in range(layer.GetFeatureCount()):
>>>     feature = layer.GetFeature(i)

The feature object, an instance of osgeo.ogr.Feature, allows us to access the geometry associated with the feature, along with the feature's attributes. According to the README.txt file, the country's name is stored in an attribute called NAME. Let's extract that name now:

>>>    feature_name = feature.GetField("NAME")

Note

Notice that the attribute is in uppercase. Shapefile attributes are case sensitive, so you have to use the exact capitalization to get the right attribute. Using feature.getField("name") would generate an error.

To get a reference to the feature's geometry object, we use the GetGeometryRef() method:

>>>     geometry = feature.GetGeometryRef()

We can do all sorts of things with geometries, but for now, let's just see what type of geometry we've got. We can do this using the GetGeometryName() method:

>>>>    geometry_type = geometry.GetGeometryName()

Finally, let's print out the information we have extracted for this feature:

>>>    print i, feature_name, geometry_type

Here is the complete mini-program we've written to unlock the contents of the shapefile:

import osgeo.ogr
shapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp")
layer = shapefile.GetLayer(0)
for i in range(layer.GetFeatureCount()):
    feature = layer.GetFeature(i)
    feature_name = feature.GetField("NAME")
    geometry = feature.GetGeometryRef()
    geometry_type = geometry.GetGeometryName()
    print i, feature_name, geometry_type

If you press Return a second time to close off the for loop, your program will run, displaying useful information about each country extracted from the shapefile:

0 Antigua and Barbuda MULTIPOLYGON
1 Algeria POLYGON
2 Azerbaijan MULTIPOLYGON
3 Albania POLYGON
4 Armenia MULTIPOLYGON
5 Angola MULTIPOLYGON
6 American Samoa MULTIPOLYGON
7 Argentina MULTIPOLYGON
8 Australia MULTIPOLYGON
9 Bahrain MULTIPOLYGON
...

Notice that the geometry associated with some countries is a polygon, while for other countries the geometry is a multipolygon. As the name suggests, a multipolygon is simply a collection of polygons. Because the geometry represents the outline of each country, a polygon is used where the country's outline can be represented by a single shape, while a multipolygon is used when the outline has multiple parts. This most commonly happens when a country is made up of multiple islands. For example:

As you can see, Algeria is represented by a polygon, while Australia with its outlying islands would be a multipolygon.