Book Image

PostGIS Cookbook

Book Image

PostGIS Cookbook

Overview of this book

Table of Contents (18 chapters)
PostGIS Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Importing nonspatial tabular data (CSV) using GDAL


As an alternative approach to the previous recipe, you will import a CSV file to PostGIS using the ogr2ogr GDAL command and the GDAL OGR virtual format . The Geospatial Abstraction Library (GDAL), is a translator library for raster geospatial data formats. OGR is the related library that provides similar capabilities for vector data formats.

This time, as an extra step, you will import only a part of the features in the file and you will reproject them to a different spatial reference system.

Getting ready

You will import the Global_24h.csv file to the PostGIS database from NASA's Earth Observing System Data and Information System (EOSDIS).

You can download the file from the EOSDIS website at http://firms.modaps.eosdis.nasa.gov/active_fire/text/Global_24h.csv, or copy it from the dataset directory of the book for this chapter.

This file represents the active hotspots detected by the Moderate Resolution Imaging Spectroradiometer (MODIS) satellites in the world for the last 24 hours. For each row, there are the coordinates of the hotspot (latitude, longitude) in decimal degrees (in the WGS 84 spatial reference system, SRID = 4326), and a series of useful fields such as the acquisition date, acquisition time, and satellite type, just to name a few.

You will import only the active fire data scanned by the satellite type marked as "T" (Terra MODIS), and you will project it using the Spherical Mercator projection coordinate system (EPSG:3857, sometimes marked as EPSG:900913, where the number 900913 represents Google in 1337 speak, as it was first widely used by Google Maps).

How to do it...

The steps you need to follow to complete this recipe are as follows:

  1. Analyze the structure of the Global_24h.csv CSV file (in Windows, open the CSV file with an editor such as Notepad).

    $ cd ~/postgis_cookbook/data/chp01/
    $ head -n 5 Global_24h.csv
    latitude,longitude,brightness,scan,track,acq_date,acq_time,satellite,confidence,version,bright_t31,frp
    -23.386,-46.197,307.5,1.1,1,2012-08-20, 0140,T,54,5.0       ,285.7,16.5
    -22.952,-47.574,330.1,1.2,1.1,2012-08-20, 0140,T,100,5.0       ,285.2,53.9
    -23.726,-56.108,333.3,4.7,2,2012-08-20, 0140,T,100,5.0       ,283.5,404.1
    -23.729,-56.155,311.8,4.7,2,2012-08-20, 0140,T,61,5.0       ,272,143.1
    
  2. Create a GDAL virtual data source composed of just one layer derived from the Global_24h.csv file. To do so, create a text file named global_24h.vrt in the same directory where the CSV file is and edit it as follows:

    <OGRVRTDataSource>
      <OGRVRTLayer name="Global_24h">
        <SrcDataSource>Global_24h.csv</SrcDataSource>
        <GeometryType>wkbPoint</GeometryType>
        <LayerSRS>EPSG:4326</LayerSRS>
        <GeometryField encoding="PointFromColumns"x="longitude" y="latitude"/>
      </OGRVRTLayer>
    </OGRVRTDataSource>
  3. With the ogrinfo command, check if the virtual layer is correctly recognized by GDAL. For example, analyze the schema of the layer and the first of its features (fid=1):

    $ ogrinfo global_24h.vrt Global_24h -fid 1
    INFO: Open of `global_24h.vrt'using driver `VRT' successful.
    Layer name: Global_24h
    Geometry: Point
    Feature Count: 30326
    Extent: (-155.284000, -40.751000) - (177.457000, 70.404000)
    Layer SRS WKT:
    GEOGCS["WGS 84",    DATUM["WGS_1984",    ...
    latitude: String (0.0)
    longitude: String (0.0)
    frp: String (0.0)
    OGRFeature(Global_24h):1
    latitude (String) = -23.386
    longitude (String) = -46.197
    frp (String) = 16.5
    POINT (-46.197 -23.386)
    
  4. You can also try to open the virtual layer with a Desktop GIS supporting a GDAL/OGR virtual driver such as Quantum GIS (QGIS ). In the following screenshot, the Global_24h layer is displayed together with the shapefile of the countries that you can find in the dataset directory of the book:

  5. Now, export the virtual layer as a new table in PostGIS using the ogr2ogr GDAL/OGR command. You need to use the -f option to specify the output format, the -t_srs option to project the points to the EPSG:3857 spatial reference, the -where option to load only the records from the MODIS Terra satellite type, and the -lco layer creation option to provide the schema where you want to store the table:

    $ ogr2ogr -f PostgreSQL -t_srs EPSG:3857 PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -lco SCHEMA=chp01 global_24h.vrt -where "satellite='T'" -lco GEOMETRY_NAME=the_geom
    
  6. Check how the ogr2ogr command created the table as shown in the following command:

    $ pg_dump -t chp01.global_24h --schema-only -U me postgis_cookbook
    CREATE TABLE global_24h (
      ogc_fid integer NOT NULL,
      the_geom public.geometry(Point,3857),
      latitude character varying,
      longitude character varying,
      brightness character varying,
      scan character varying,
      track character varying,
      acq_date character varying,
      acq_time character varying,
      satellite character varying,
      confidence character varying,
      version character varying,
      bright_t31 character varying,
      frp character varying
    );
    
  7. Now, check the record that should appear in the geometry_columns metadata view:

    postgis_cookbook=# SELECT f_geometry_column, coord_dimension, srid, type FROM geometry_columns WHERE f_table_name = 'global_24h';
     f_geometry_column | coord_dimension | srid   | type  
    -------------------+-----------------+--------+-------
     the_geom          |           2     |   3857 | POINT
    (1 row)
    
  8. Check how many records have been imported in the table:

    postgis_cookbook=# select count(*) from chp01.global_24h;
    count
    -------
    9190
    (1 row)
    
  9. Note how the coordinates have been projected from EPSG:4326 to EPSG:3857:

    postgis_cookbook=# SELECT ST_AsEWKT(the_geom) FROM chp01.global_24h LIMIT 1;
    st_asewkt
    ------------------------------------------------------
    SRID=3857;POINT(-5142626.51617686 -2678766.03496892)
    (1 row)
    

How it works...

As mentioned in the GDAL documentation:

"OGR Virtual Format is a driver that transforms features read from other drivers based on criteria specified in an XML control file."

GDAL supports the reading and writing of nonspatial tabular data stored as a CSV file, but we need to use a virtual format to derive the geometry of the layers from attribute columns in the CSV file (the longitude and latitude coordinates for each point). For this purpose, you need to at least specify in the driver the path to the CSV file (the SrcDataSource element), the geometry type (the GeometryType element), the spatial reference definition for the layer (the LayerSRS element), and the way the driver can derive the geometric information (the GeometryField element).

There are many other options and reasons for using OGR virtual formats; if you are interested in having a better understanding, please refer to the GDAL documentation available at http://www.gdal.org/ogr/drv_vrt.html.

After a virtual format is correctly created, the original flat nonspatial dataset is spatially supported by GDAL and the software based on GDAL. This is the reason why we can manipulate these files with GDAL commands such as ogrinfo and ogr2ogr, and with Desktop GIS software such as QGIS.

Once we have verified that GDAL can correctly read the features from the virtual driver, we can easily import them in PostGIS using the popular ogr2ogr command-line utility. The ogr2ogr command has a plethora of options, so refer to its documentation at http://www.gdal.org/ogr2ogr.html for a more in-depth discussion.

In this recipe, you have just seen some of these options, such as:

  • -where option: Used to export just a selection of the original feature class

  • -t_srs option: Used to reproject the data to a different spatial reference system

  • -lco layer creation option: Used to provide the schema where we would want to store the table (without it, the new spatial table would be created in the public schema) and the name of the geometry field in the output layer