Book Image

PostGIS Cookbook - Second Edition

By : Pedro Wightman, Bborie Park, Stephen Vincent Mather, Thomas Kraft, Mayra Zurbarán
Book Image

PostGIS Cookbook - Second Edition

By: Pedro Wightman, Bborie Park, Stephen Vincent Mather, Thomas Kraft, Mayra Zurbarán

Overview of this book

PostGIS is a spatial database that integrates the advanced storage and analysis of vector and raster data, and is remarkably flexible and powerful. PostGIS provides support for geographic objects to the PostgreSQL object-relational database and is currently the most popular open source spatial databases. If you want to explore the complete range of PostGIS techniques and expose related extensions, then this book is for you. This book is a comprehensive guide to PostGIS tools and concepts which are required to manage, manipulate, and analyze spatial data in PostGIS. It covers key spatial data manipulation tasks, explaining not only how each task is performed, but also why. It provides practical guidance allowing you to safely take advantage of the advanced technology in PostGIS in order to simplify your spatial database administration tasks. Furthermore, you will learn to take advantage of basic and advanced vector, raster, and routing approaches along with the concepts of data maintenance, optimization, and performance, and will help you to integrate these into a large ecosystem of desktop and web tools. By the end, you will be armed with all the tools and instructions you need to both manage the spatial database system and make better decisions as your project's requirements evolve.
Table of Contents (18 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

Handling batch importing and exporting of datasets


In many GIS workflows, there is a typical scenario where subsets of a PostGIS table must be deployed to external users in a filesystem format (most typically, shapefiles or a spatialite database). Often, there is also the reverse process, where datasets received from different users have to be uploaded to the PostGIS database.

In this recipe, we will simulate both of these data flows. You will first create the data flow for processing the shapefiles out of PostGIS, and then the reverse data flow for uploading the shapefiles.

You will do it using the power of bash scripting and the ogr2ogr command.

Getting ready

If you didn't follow all the other recipes, be sure to import the hotspots (Global_24h.csv) and the countries dataset (countries.shp) in PostGIS. The following is how to do it with ogr2ogr (you should import both the datasets in their original SRID, 4326, to make spatial operations faster):

  1. Import in PostGIS the Global_24h.csv file, using the global_24.vrt virtual driver you created in a previous recipe:
      $ ogr2ogr -f PostgreSQL PG:"dbname='postgis_cookbook' 
      user='me' password='mypassword'" -lco SCHEMA=chp01 global_24h.vrt 
      -lco OVERWRITE=YES -lco GEOMETRY_NAME=the_geom -nln hotspots
  1. Import the countries shapefile using ogr2ogr:
      $ ogr2ogr -f PostgreSQL -sql "SELECT ISO2, NAME AS country_name 
      FROM wborders" -nlt MULTIPOLYGON PG:"dbname='postgis_cookbook' 
      user='me' password='mypassword'" -nln countries 
      -lco SCHEMA=chp01 -lco OVERWRITE=YES 
      -lco GEOMETRY_NAME=the_geom wborders.shp

Note

If you already imported the hotspots dataset using the 3857 SRID, you can use the PostGIS 2.0 method that allows the user to modify the geometry type column of an existing spatial table. You can update the SRID definition for the hotspots table in this way thanks to the support of typmod on geometry objects:postgis_cookbook=# ALTER TABLE chp01.hotspotsALTER COLUMN the_geomSET DATA TYPE geometry(Point, 4326)USING ST_Transform(the_geom, 4326);

How to do it...

The steps you need to follow to complete this recipe are as follows:

  1. Check how many hotspots there are for each distinct country by using the following query:
      postgis_cookbook=> SELECT c.country_name, MIN(c.iso2) 
      as iso2, count(*) as hs_count FROM chp01.hotspots as hs 
      JOIN chp01.countries as c ON ST_Contains(c.the_geom, hs.the_geom) 
      GROUP BY c.country_name ORDER BY c.country_name;

The output of the preceding command is as follows:

  1. Using the same query, generate a CSV file using the PostgreSQL COPY command or the ogr2ogr command (in the first case, make sure that the Postgre service user has full write permission to the output directory). If you are following the COPY approach and using Windows, be sure to replace /tmp/hs_countries.csv with a different path:
      $ ogr2ogr -f CSV hs_countries.csv 
      PG:"dbname='postgis_cookbook' user='me' password='mypassword'"
      -lco SCHEMA=chp01 -sql "SELECT c.country_name, MIN(c.iso2) as iso2, 
      count(*) as hs_count FROM chp01.hotspots as hs 
      JOIN chp01.countries as c ON ST_Contains(c.the_geom, hs.the_geom) 
      GROUP BY c.country_name ORDER BY c.country_name"
      postgis_cookbook=> COPY (SELECT c.country_name, MIN(c.iso2) as iso2, 
      count(*) as hs_count  FROM chp01.hotspots as hs
      JOIN chp01.countries as c  ON ST_Contains(c.the_geom, hs.the_geom)
      GROUP BY c.country_name  ORDER BY c.country_name) 
      TO '/tmp/hs_countries.csv' WITH CSV HEADER;
  1. If you are using Windows, go to step 5. With Linux, create a bash script named export_shapefiles.sh that iterates each record (country) in the hs_countries.csv file and generates a shapefile with the corresponding hotspots exported from PostGIS for that country:
        #!/bin/bash 
        while IFS="," read country iso2 hs_count 
        do 
          echo "Generating shapefile $iso2.shp for country 
          $country ($iso2) containing $hs_count features." 
          ogr2ogr out_shapefiles/$iso2.shp
          PG:"dbname='postgis_cookbook' user='me' password='mypassword'"
          -lco SCHEMA=chp01 -sql "SELECT ST_Transform(hs.the_geom, 4326), 
          hs.acq_date, hs.acq_time, hs.bright_t31 
          FROM chp01.hotspots as hs JOIN chp01.countries as c 
          ON ST_Contains(c.the_geom, ST_Transform(hs.the_geom, 4326))  
          WHERE c.iso2 = '$iso2'" done < hs_countries.csv 
  1. Give execution permissions to the bash file, and then run it after creating an output directory (out_shapefiles) for the shapefiles that will be generated by the script. Then, go to step 7:
      chmod 775 export_shapefiles.sh
      mkdir out_shapefiles
      $ ./export_shapefiles.sh
      Generating shapefile AL.shp for country 
        Albania (AL) containing 66 features.
      Generating shapefile DZ.shp for country 
        Algeria (DZ) containing 361 features.
...
      Generating shapefile ZM.shp for country 
        Zambia (ZM) containing 1575 features.
      Generating shapefile ZW.shp for country 
        Zimbabwe (ZW) containing 179 features.

Note

If you get the outputERROR: function getsrid(geometry) does not exist LINE 1: SELECT getsrid("the_geom") FROM (SELECT,..., you will need to load legacy support in PostGIS, for example, in a Debian Linux box:psql -d postgis_cookbook -f /usr/share/postgresql/9.1/contrib/postgis-2.1/legacy.sql

  1. If you are using Windows, create a batch file named export_shapefiles.bat that iterates each record (country) in the hs_countries.csv file and generates a shapefile with the corresponding hotspots exported from PostGIS for that country:
        @echo off 
        for /f "tokens=1-3 delims=, skip=1" %%a in (hs_countries.csv) do ( 
          echo "Generating shapefile %%b.shp for country %%a 
                (%%b) containing %%c features" 
          ogr2ogr .\out_shapefiles\%%b.shp 
          PG:"dbname='postgis_cookbook' user='me' password='mypassword'" 
          -lco SCHEMA=chp01 -sql "SELECT ST_Transform(hs.the_geom, 4326), 
          hs.acq_date, hs.acq_time, hs.bright_t31 
          FROM chp01.hotspots as hs JOIN chp01.countries as c 
          ON ST_Contains(c.the_geom, ST_Transform(hs.the_geom, 4326)) 
          WHERE c.iso2 = '%%b'" 
        ) 
  1. Run the batch file after creating an output directory (out_shapefiles) for the shapefiles that will be generated by the script:
      >mkdir out_shapefiles
      >export_shapefiles.bat
      "Generating shapefile AL.shp for country 
       Albania (AL) containing 66 features"
      "Generating shapefile DZ.shp for country 
       Algeria (DZ) containing 361 features"
      ...
      "Generating shapefile ZW.shp for country 
       Zimbabwe (ZW) containing 179 features"
  1. Try to open a couple of these output shapefiles in your favorite desktop GIS. The following screenshot shows you how they look in QGIS:
  1. Now, you will do the return trip, uploading all of the generated shapefiles to PostGIS. You will upload all of the features for each shapefile and include the upload datetime and the original shapefile name. First, create the following PostgreSQL table, where you will upload the shapefiles:
      postgis_cookbook=# CREATE TABLE chp01.hs_uploaded
      (
        ogc_fid serial NOT NULL,
        acq_date character varying(80),
        acq_time character varying(80),
        bright_t31 character varying(80),
        iso2 character varying,
        upload_datetime character varying,
        shapefile character varying,
        the_geom geometry(POINT, 4326),
        CONSTRAINT hs_uploaded_pk PRIMARY KEY (ogc_fid)
      );
  1. If you are using Windows, go to step 12. With OS X, you will need to install findutils with homebrew and run the script for Linux:
      $ brew install findutils
  1. With Linux, create another bash script named import_shapefiles.sh:
        #!/bin/bash 
        for f in `find out_shapefiles -name \*.shp -printf "%f\n"` 
        do 
          echo "Importing shapefile $f to chp01.hs_uploaded PostGIS
            table..." #, ${f%.*}" 
          ogr2ogr -append -update  -f PostgreSQL
          PG:"dbname='postgis_cookbook' user='me'
          password='mypassword'" out_shapefiles/$f 
          -nln chp01.hs_uploaded -sql "SELECT acq_date, acq_time,
          bright_t31, '${f%.*}' AS iso2, '`date`' AS upload_datetime,  
         'out_shapefiles/$f' as shapefile FROM ${f%.*}" 
        done 
  1. Assign the execution permission to the bash script and execute it:
      $ chmod 775 import_shapefiles.sh
      $ ./import_shapefiles.sh
      Importing shapefile DO.shp to chp01.hs_uploaded PostGIS table
      ...
      Importing shapefile ID.shp to chp01.hs_uploaded PostGIS table
      ...
      Importing shapefile AR.shp to chp01.hs_uploaded PostGIS table
      ......

Now, go to step 14.

  1. If you are using Windows, create a batch script named import_shapefiles.bat:
        @echo off 
        for %%I in (out_shapefiles\*.shp*) do ( 
          echo Importing shapefile %%~nxI to chp01.hs_uploaded
          PostGIS table... 
 
          ogr2ogr -append -update  -f PostgreSQL
          PG:"dbname='postgis_cookbook' user='me'
          password='password'" out_shapefiles/%%~nxI 
          -nln chp01.hs_uploaded -sql "SELECT acq_date, acq_time, 
          bright_t31, '%%~nI' AS iso2, '%date%' AS upload_datetime, 
          'out_shapefiles/%%~nxI' as shapefile FROM %%~nI" 
        ) 
  1. Run the batch script:
      >import_shapefiles.bat
      Importing shapefile AL.shp to chp01.hs_uploaded PostGIS table...
      Importing shapefile AO.shp to chp01.hs_uploaded PostGIS table...
      Importing shapefile AR.shp to chp01.hs_uploaded PostGIS table......
  1. Check some of the records that have been uploaded to the PostGIS table by using SQL:
      postgis_cookbook=# SELECT upload_datetime,
      shapefile, ST_AsText(wkb_geometry)
      FROM chp01.hs_uploaded WHERE ISO2='AT';

The output of the preceding command is as follows:

  1. Check the same query with ogrinfo as well:
      $ ogrinfo PG:"dbname='postgis_cookbook' user='me'
      password='mypassword'"
      chp01.hs_uploaded -where "iso2='AT'"

The output of the preceding command is as follows:

How it works...

You could implement both the data flows (processing shapefiles out from PostGIS, and then into it again) thanks to the power of the ogr2ogr GDAL command.

You have been using this command in different forms and with the most important input parameters in other recipes, so you should now have a good understanding of it.

Here, it is worth mentioning the way OGR lets you export the information related to the current datetime and the original shapefile name to the PostGIS table. Inside the import_shapefiles.sh (Linux, OS X) or the import_shapefiles.bat (Windows) scripts, the core is the line with the ogr2ogr command (here is the Linux version):

ogr2ogr -append -update  -f PostgreSQL PG:"dbname='postgis_cookbook' user='me' password='mypassword'" out_shapefiles/$f -nln chp01.hs_uploaded -sql "SELECT acq_date, acq_time, bright_t31, '${f%.*}' AS iso2, '`date`' AS upload_datetime, 'out_shapefiles/$f' as shapefile FROM ${f%.*}"

Thanks to the -sql option, you can specify the two additional fields, getting their values from the system date command and the filename that is being iterated from the script.