Book Image

PostGIS Cookbook

Book Image

PostGIS Cookbook

Overview of this book

Table of Contents (18 chapters)
PostGIS Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Handling batch importing and exporting of datasets


In many GIS workflows, there is a typical scenario where subsets of a PostGIS table must be deployed to external users in a filesystem format (most typically, shapefiles or a spatialite database). Often, there is also the reverse process, where datasets received from different users have to be uploaded to the PostGIS database.

In this recipe, we will simulate both of these data flows. You will first create the data flow for processing the shapefiles out of PostGIS, and then the reverse data flow for uploading the shapefiles.

You will do it using the power of bash scripting and the ogr2ogr command.

Getting ready

If you didn't follow all the other recipes, be sure to import the hotspots and the countries dataset in PostGIS. The following is how to do it with ogr2ogr (you should import both the datasets in their original SRID, 4326, to make spatial operations faster):

  1. Import in PostGIS the Global_24h.csv file using the global_24.vrt virtual driver you created in a previous recipe:

    $ ogr2ogr -f PostgreSQL PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -lco SCHEMA=chp01 global_24h.vrt -lco OVERWRITE=YES -lco GEOMETRY_NAME=the_geom -nln hotspots
    
  2. Import the countries shapefile using ogr2ogr:

    $ ogr2ogr -f PostgreSQL -sql "SELECT ISO2, NAME AS country_name FROM 'TM_WORLD_BORDERS-0.3'" -nlt MULTIPOLYGON PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -nln countries -lco SCHEMA=chp01 -lco OVERWRITE=YES -lco GEOMETRY_NAME=the_geom TM_WORLD_BORDERS-0.3.shp
    

    Note

    In case you already imported the hotspots dataset using the 3857 SRID, you can use the new PostGIS 2.0 method that allows the user to modify the geometry type column of an existing spatial table. You can update the SRID definition for the hotspots table in this way thanks to the support of typmod on geometry objects:

    postgis_cookbook=# ALTER TABLE hotspots
    ALTER COLUMN the_geom
    SET DATA TYPE geometry(Point, 4326)
    USING ST_Transform(the_geom, 4326);
    

How to do it...

The steps you need to follow to complete this recipe are as follows:

  1. Check how many hotspots there are for each distinct country by using the following query:

    postgis_cookbook=> SELECT c.country_name, MIN(c.iso2) as iso2, count(*) as hs_count
    FROM chp01.hotspots as hs JOIN chp01.countries as c ON ST_Contains(c.the_geom, hs.the_geom) GROUP BY c.country_name ORDER BY c.country_name;
    

    The output of the preceding command is as follows:

    country_name | iso2 | hs_count
    --------------------------------------------
    Albania      | AL   | 66
    Algeria      | DZ   | 361
    ...
    Yemen        | YE   | 6
    Zambia       | ZM   | 1575
    Zimbabwe     | ZW   | 179
    (103 rows)
    
  2. Using the same query, generate a CSV file using the PostgreSQL COPY command or the ogr2ogr command (in the first case, make sure that the Postgre service user has full write permission to the output directory). If you are following the COPY approach and using Windows, be sure to replace /tmp/hs_countries.csv with a different path:

    $ ogr2ogr -f CSV hs_countries.csv PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -lco SCHEMA=chp01 -sql "SELECT c.country_name, MIN(c.iso2) as iso2, count(*) as hs_count FROM chp01.hotspots as hs JOIN chp01.countries as c ON ST_Contains(c.the_geom, hs.the_geom) GROUP BY c.country_name ORDER BY c.country_name"
    postgis_cookbook=> COPY (SELECT c.country_name, MIN(c.iso2) as iso2, count(*) as hs_count
      FROM chp01.hotspots as hs
      JOIN chp01.countries as c
      ON ST_Contains(c.the_geom, hs.the_geom)
      GROUP BY c.country_name
      ORDER BY c.country_name) TO '/tmp/hs_countries.csv' WITH CSV HEADER;
    
  3. If you are using Windows, go to step 5. With Linux, create a bash script named export_shapefiles.sh that iterates each record (country) in the hs_countries.csv file and generates a shapefile with the corresponding hotspots exported from PostGIS for that country:

    #!/bin/bash
    while IFS="," read country iso2 hs_count
    do
      echo "Generating shapefile $iso2.shp for country $country ($iso2) containing $hs_count features."
    ogr2ogr out_shapefiles/$iso2.shp 
    PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -lco SCHEMA=chp01 -sql "SELECT ST_Transform(hs.the_geom, 4326), hs.acq_date, hs.acq_time, hs.bright_t31 FROM
    chp01.hotspots as hs JOIN chp01.countries as c ON 
    ST_Contains(c.the_geom, ST_Transform(hs.the_geom, 4326)) WHERE
    c.iso2 = '$iso2'"
    done < hs_countries.csv
  4. Give execution permissions to the bash file, and then run it after creating an output directory (out_shapefiles) for the shapefiles that will be generated by the script. Then, go to step 7:

    chmod 775 export_shapefiles.sh
    mkdir out_shapefiles
    $ ./export_shapefiles.sh
    Generating shapefile AL.shp for country Albania (AL) containing 66 features.
    Generating shapefile DZ.shp for country Algeria (DZ) containing 361 features.
    ...
    Generating shapefile ZM.shp for country Zambia (ZM) containing 1575 features.
    Generating shapefile ZW.shp for country Zimbabwe (ZW) containing 179 features.
    

    Tip

    If you get the output as ERROR: function getsrid(geometry) does not exist LINE 1: SELECT getsrid("the_geom") FROM (SELECT,..., you will need to load the legacy support in PostGIS, for example, in a Debian Linux box:

    psql -d postgis_cookbook -f /usr/share/postgresql/9.1/contrib/postgis-2.1/legacy.sql
    
  5. If you are using Windows, create a batch file named export_shapefiles.bat, that iterates each record (country) in the hs_countries.csv file and generates a shapefile with the corresponding hotspots exported from PostGIS for that country:

    @echo off
    for /f "tokens=1-3 delims=, skip=1" %%a in (hs_countries.csv) do (
        echo "Generating shapefile %%b.shp for country %%a (%%b) containing %%c features"
        ogr2ogr out_shapefiles/%%b.shp PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -lco SCHEMA=chp01 -sql "SELECT ST_Transform(hs.the_geom, 4326), hs.acq_date, hs.acq_time, hs.bright_t31 FROM chp01.hotspots as hs JOIN chp01.countries as c ONST_Contains(c.the_geom, ST_Transform(hs.the_geom, 4326)) WHERE c.iso2 = '%%b'"
    )
  6. Run the batch file after creating an output directory (out_shapefiles) for the shapefiles that will be generated by the script:

    >mkdir out_shapefiles
    >export_shapefiles.bat
    "Generating shapefile AL.shp for country Albania (AL) containing 66 features"
    "Generating shapefile DZ.shp for country Algeria (DZ) containing 361 features"
    
    "Generating shapefile ZW.shp for country Zimbabwe (ZW) containing 179 features"
    
  7. Try to open a couple of these output shapefiles in your favorite Desktop GIS. The following screenshot shows you how they look in QGIS:

  8. Now, you will do the round trip, uploading all of the generated shapefiles to PostGIS. You will upload all of the features for each shapefile and include the upload datetime and the original shapefile name. First, create the following PostgreSQL table, where you will upload the shapefiles:

    postgis_cookbook=# CREATE TABLE chp01.hs_uploaded
    (
      ogc_fid serial NOT NULL,
      acq_date character varying(80),
      acq_time character varying(80),
      bright_t31 character varying(80),
      iso2 character varying,
      upload_datetime character varying,
      shapefile character varying,
      the_geom geometry(POINT, 4326),
      CONSTRAINT hs_uploaded_pk PRIMARY KEY (ogc_fid)
    );
    
  9. If you are using Windows, go to step 11. With Linux, create another bash script named import_shapefiles.sh:

    #!/bin/bash
    for f in `find out_shapefiles -name \*.shp -printf "%f\n"`
    do
      echo "Importing shapefile $f to chp01.hs_uploaded PostGIStable..." #, ${f%.*}"
      ogr2ogr -append -update  -f PostgreSQLPG:"dbname='postgis_cookbook' user='me'password='mypassword'" out_shapefiles/$f -nlnchp01.hs_uploaded -sql "SELECT acq_date, acq_time,bright_t31, '${f%.*}' AS iso2, '`date`' AS upload_datetime,'out_shapefiles/$f' as shapefile FROM ${f%.*}"
    done
  10. Assign the execution permission to the bash script and execute it:

    $ chmod 775 import_shapefiles.sh
    $ ./import_shapefiles.sh
    Importing shapefile DO.shp to chp01.hs_uploaded PostGIS table...
    Importing shapefile ID.shp to chp01.hs_uploaded PostGIS table...
    Importing shapefile AR.shp to chp01.hs_uploaded PostGIS table...
    ...
    
  11. If you are using Windows, create a batch script named import_shapefiles.bat:

    @echo off
    for %%I in (out_shapefiles\*.shp) 
    do (
      echo Importing shapefile %%~nxI to chp01.hs_uploadedPostGIS table...
      ogr2ogr -append -update  -f PostgreSQLPG:"dbname='postgis_cookbook' user='me'password='mypassword'" out_shapefiles/%%~nxI -nlnchp01.hs_uploaded -sql "SELECT acq_date, acq_time,bright_t31, '%%~nI' AS iso2, '%date%' AS upload_datetime,'out_shapefiles/%%~nxI' as shapefile FROM %%~nI"
    )
  12. Run the batch script:

    >import_shapefiles.bat
    Importing shapefile AL.shp to chp01.hs_uploaded PostGIS table...
    Importing shapefile AO.shp to chp01.hs_uploaded PostGIS table...
    Importing shapefile AR.shp to chp01.hs_uploaded PostGIS table...
    ...
    
  13. Check some of the records that have been uploaded to the PostGIS table by using SQL:

    postgis_cookbook=# SELECT upload_datetime, shapefile, ST_AsText(the_geom) FROM chp01.hs_uploaded WHERE ISO2='AT';
    upload_datetime        |       shapefile       |      st_astext-------------------------------+-----------------------+----------------------Sun Aug 26 01:58:44 CEST 2012 | out_shapefiles/AT.shp | POINT(14.333 48.279)Sun Aug 26 01:58:44 CEST 2012 | out_shapefiles/AT.shp | POINT(14.347 48.277)Sun Aug 26 01:58:44 CEST 2012 | out_shapefiles/AT.shp | POINT(14.327 48.277)...
    (8 rows)
    
  14. Check the same query with ogrinfo as well:

    $ ogrinfo PG:"dbname='postgis_cookbook' user='me' password='mypassword'" chp01.hs_uploaded -where "iso2='AT'"
    INFO: Open of `PG:dbname='postgis_cookbook' user='me' password='mypassword''using driver `PostgreSQL' successful.Layer name: chp01.hs_uploaded
    Geometry: Point
    Feature Count: 8
    Extent: (-155.284000, -40.751000) - (177.457000, 70.404000)
    Layer SRS WKT:
    GEOGCS["WGS 84",
        ...
    FID Column = ogc_fid
    Geometry Column = the_geom
    acq_date: String (80.0)
    acq_time: String (80.0)
    bright_t31: String (80.0)
    iso2: String (0.0)
    upload_datetime: String (0.0)
    shapefile: String (0.0)
    OGRFeature(chp01.hs_uploaded):6413acq_date (String) = 2012-08-20
      acq_time (String) = 0110
      bright_t31 (String) = 292.7
      iso2 (String) = AT
      upload_datetime (String) = Sun Aug 26 01:58:44 CEST 2012
      shapefile (String) = out_shapefiles/AT.shp
      POINT (14.333 48.279)
    ...
    

How it works...

You could implement both the data flows (processing shapefiles out from PostGIS, and then into it again) thanks to the power of the ogr2ogr GDAL command.

You have been using this command in different forms and with the most important input parameters in other recipes, so you should now have a good understanding of it.

Here, it is worth mentioning the way OGR lets you export the information related to the current datetime and the original shapefile name to the PostGIS table. Inside the import_shapefiles.sh (Linux, OS X) or the import_shapefiles.bat (Windows) scripts, the core is the line with the ogr2ogr command (here is the Linux version):

 ogr2ogr -append -update  -f PostgreSQL PG:"dbname='postgis_cookbook' user='me' password='mypassword'" out_shapefiles/$f -nln chp01.hs_uploaded -sql "SELECT acq_date, acq_time, bright_t31, '${f%.*}' AS iso2, '`date`' AS upload_datetime, 'out_shapefiles/$f' as shapefile FROM ${f%.*}"

Thanks to the -sql option, you can specify the two additional fields—getting their values from the system date command and the filename that is being iterated from the script.