Book Image

Mastering PostGIS

By : Dominik Mikiewicz, Michal Mackiewicz , Tomasz Nycz
Book Image

Mastering PostGIS

By: Dominik Mikiewicz, Michal Mackiewicz , Tomasz Nycz

Overview of this book

PostGIS is open source extension onf PostgreSQL object-relational database system that allows GIS objects to be stored and allows querying for information and location services. The aim of this book is to help you master the functionalities offered by PostGIS- from data creation, analysis and output, to ETL and live edits. The book begins with an overview of the key concepts related to spatial database systems and how it applies to Spatial RMDS. You will learn to load different formats into your Postgres instance, investigate the spatial nature of your raster data, and finally export it using built-in functionalities or 3th party tools for backup or representational purposes. Through the course of this book, you will be presented with many examples on how to interact with the database using JavaScript and Node.js. Sample web-based applications interacting with backend PostGIS will also be presented throughout the book, so you can get comfortable with the modern ways of consuming and modifying your spatial data.
Table of Contents (9 chapters)

Importing flat data


Loading flat data may seem to be a bit dull initially but it is important to understand that many popular and interesting datasets often contain the spatial information in very different formats, such as:

  • Coordinates expressed in Lon/Lat or projected coordinates
  • Encoded geometry, for example WKT, TopoJSON, GeoJSON
  • Location in the form of an address
  • Location in non-cartesian coordinates, for example start point, angle and direction
  • While the earlier examples indicate the data would require further processing in order to extract the spatial content into a usable form, clearly ability to import flat datasets should not be underestimated

Note

Flat data in our scenario is data with no explicitly expressed geometry - non-spatial format, text-based files

Importing data using psql

Psql is the pgsql's command-line tool. While one can achieve quite a lot with GUI based database management utilities, psql is very useful when one needs to handle database backups, management and alike via scripting. When there is no GUI installed on the server, psql becomes pretty much the only option so it is worth being familiar with it even if you're not a fan.

In order to import the data in psql we will use a \COPY command. This requires us to define the data model for the incoming data first.

Defining the table data model from a text file may be prone to errors that will prevent data from being imported. If for of some reason you are not sure what data types are stored in the particular columns of your source file you can import all the data as text and then re-cast it as required at a later time.

Importing data interactively

In this example we will import the earthquakes data from USGS. So let's fire up psql and connect to the database server:

F:\mastering_postgis\chapter02>psql -h localhost -p 5434 -U postgres

You should see a similar output:

    psql (9.5.0)
    Type "help" for help.
    postgres=#

Then we need to connect to the mastering_postgis database:

    postgres=# \c mastering_postgis

The following output should be displayed:

    You are now connected to database "mastering_postgis" as user 
    "postgres".
    mastering_postgis=#

Note

In the psql examples I am using postgres user. As you may guess, it is a superuser account. This is not the thing you would normally do, but it will keep the examples simple. In a production environment, using a db user with credentials allowing access to specific resources is a sensible approach.

If you have not had a chance to create our data_import schema, let's take care of it now by typing the following command:

    mastering_postgis=# create schema if not exists data_import;

You should see a similar output:

    NOTICE:  schema "data_import" already exists, skipping
    CREATE SCHEMA

Once the schema is there, we create the table that will store the data. In order to do so just type or paste the following into psql:

  create table data_import.earthquakes_csv ( 
     "time" timestamp with time zone, 
     latitude numeric, 
     longitude numeric, 
     depth numeric, 
     mag numeric, 
     magType varchar, 
     nst numeric, 
     gap numeric, 
     dmin numeric, 
     rms numeric, 
     net varchar, 
     id varchar, 
     updated timestamp with time zone, 
     place varchar, 
     type varchar, 
     horizontalError numeric, 
     depthError numeric, 
     magError numeric, 
     magNst numeric, 
     status varchar, 
     locationSource varchar, 
     magSource varchar 
  );

You should see the following output:

    mastering_postgis=# create table data_import.earthquakes_csv (
    mastering_postgis(# "time" timestamp with time zone,
    mastering_postgis(# latitude numeric,
    mastering_postgis(# longitude numeric,
    mastering_postgis(# depth numeric,
    mastering_postgis(# mag numeric,
    mastering_postgis(# magType varchar,
    mastering_postgis(# nst numeric,
    mastering_postgis(# gap numeric,
    mastering_postgis(# dmin numeric,
    mastering_postgis(# rms numeric,
    mastering_postgis(# net varchar,
    mastering_postgis(# id varchar,
    mastering_postgis(# updated timestamp with time zone,
    mastering_postgis(# place varchar,
    mastering_postgis(# type varchar,
    mastering_postgis(# horizontalError numeric,
    mastering_postgis(# depthError numeric,
    mastering_postgis(# magError numeric,
    mastering_postgis(# magNst numeric,
    mastering_postgis(# status varchar,
    mastering_postgis(# locationSource varchar,
    mastering_postgis(# magSource varchar
    mastering_postgis(# );
    CREATE TABLE

Now, as we have our data table ready, we can finally get to the import part. The following command should handle importing the data into our newly created table:

\copy data_import.earthquakes_csv from data\2.5_day.csv with DELIMITER ',' CSV HEADER

You should see a similar output:

mastering_postgis=# \copy data_import.earthquakes_csv from data\2.5_day.csv with DELIMITER ',' CSV HEADER
    COPY 25

Note

If you require a complete reference on the \COPY command, simply type in: \h COPY into the cmd.

While you can customize your data after importing, you may wish to only import a subset of columns in the first place. Unfortunately \COPY command imports all the columns (although you may specify where to put them) so the solution to this would be:

  • Create a table that models the input CSV
  • Import all the data
  • Create a table with a subset of columns
  • Copy data over
  • Delete the input table

Even though everything said earlier is possible in psql, it requires quite a lot of typing. Because of that we will take care of this scenario in the next stage. Demonstrating the non-interactive psql mode.

Importing data non-interactively

For the non-interactive psql data import example we'll do a bit more than in the interactive mode. We'll:

  • Import the full earthquakes dataset
  • Select a subset of earthquakes data mentioned in the previous example and insert it into its own table
  • Import another dataset - in this case the Ordnance Survey's POIs

Basically the non-interactive usage of psql means we simply provide it with an SQL to execute. This way we can put together many statements without having to execute them one by one.

Once again we will need the data model prior to loading the data, and then a \COPY command will be used.

If you're still in psql, you can execute a script by simply typing:

\i path\to\the\script.sql

For example:

\i F:/mastering_postgis/chapter02/code/data_import_earthquakes.sql

You should see a similar output:

mastering_postgis-# \i F:/mastering_postgis/chapter02/code/data_import_earthquakes.sql
    CREATE SCHEMA
    psql:F:/mastering_postgis/chapter02/code/data_import_earthquakes.sql:5: NOTICE:  table "earthquakes_csv" does not exist, skipping
    DROP TABLE
    CREATE TABLE
    COPY 25
    psql:F:/mastering_postgis/chapter02/code/data_import_earthquakes.sql:58: NOTICE:  table "earthquakes_csv_subset" does not exist, skipping
    DROP TABLE
    SELECT 25
    mastering_postgis-#

If you quit psql already, type the following command into cmd:

psql -h host -p port -U user -d database -f path\to\the\script.sql

For example:

psql -h localhost -p 5434 -U postgres -d mastering_postgis -f F:\mastering_postgis\chapter02\code\data_import_earthquakes.sql

You should see a similar output:

F:\mastering_postgis\chapter02>psql -h localhost -p 5434 -U postgres -d mastering_postgis -f F:\mastering_postgis\chapter02\code\data_import_earthquakes.sql
    psql:F:/mastering_postgis/chapter02/code/data_import_earthquakes.sql:2: NOTICE:  schema "data_import" already exists, skipping
    CREATE SCHEMA
    DROP TABLE
    CREATE TABLE
    COPY 25
    DROP TABLE
  SELECT 25

The script executed earlier is in the book's code repository under Chapter02/code/ data_import_earthquakes.sql.

Loading OS POI data is now a piece of cake. This dataset is in a bit of a different format though, so it requires slight adjustments. You can review the code in Chapter02/code/ data_import_gb_poi.sql.

Importing data using pgAdmin

In this section we'll import some new data we have not interacted with before - this time we'll have a look at the Ordnance Survey's address data we obtained in the CSV format.

Note

Depending on the pgAdmin version, the UI may differ a bit. The described functionality should always be present though. For the examples involving pgAdmin, screenshots were taken using pgAdmin III (1.22.2).

PgAdmin's import functionality is basically a wrapper around the \COPY so it does require a data model in order to work. Because of that, let's quickly create a table that will be populated with the imported data. You can do it with the GUI by simply right-clicking a schema node you want to create the table in and choosing New Object | New Table and then providing all the necessary model definitions in the displayed window:

You can also type some SQL which in many cases is a bit quicker:

drop table if exists data_import.osgb_addresses; 
create table data_import.osgb_addresses( 
   uprn bigint, 
   os_address_toid varchar, 
   udprn integer, 
   organisation_name varchar, 
   department_name varchar, 
   po_box varchar, 
   sub_building_name varchar, 
   building_name varchar, 
   building_number varchar, 
   dependent_thoroughfare varchar, 
   thoroughfare varchar, 
   post_town varchar, 
   dbl_dependent_locality varchar, 
   dependent_locality varchar, 
   postcode varchar, 
   postcode_type varchar, 
   x numeric, 
   y numeric, 
   lat numeric, 
   lon numeric, 
   rpc numeric, 
   country varchar, 
   change_type varchar, 
   la_start_date date, 
   rm_start_date date, 
   last_update_date date, 
   class varchar 
); 

Once our table is ready, importing data is just a matter of right clicking the table node in PgAdmin and choosing Import. An import wizard that assists with the import process will be displayed:

Note

All the earlier could obviously be achieved with pure SQL and in fact we have done this already in the previous section on importing data in psql in non-interactive mode. You can review the SQL code available in Chapter02/code for details.

Extracting spatial information from flat data

As we have some flat data already in our database, it's time to extract the spatial information. So far all the datasets, used Cartesian coordinate systems so our job is really straightforward:

drop table if exists data_import.earthquakes_subset_with_geom; 
select  
   id, 
   "time", 
   depth, 
   mag, 
   magtype, 
   place,Points of Interest in TXT format 
   ST_SetSRID(ST_Point(longitude, latitude), 4326) as geom 
into data_import.earthquakes_subset_with_geom 
from data_import.earthquakes_csv; 

This example extracts a subset of data and puts data into a new table with coordinates being expressed as a geometry type, rather than two columns with numeric data appropriate for Lon and Lat.

In order to quickly preview the data, we dump the table's content to KML using ogr2ogr (this is a little spoiler on the next chapter on exporting the data from PostGIS indeed):

ogr2ogr -f "KML" earthquakes_from_postgis.kml PG:"host=localhost port=5434 user=postgres dbname=mastering_postgis" data_import.earthquakes_subset_with_geom -t_srs EPSG:4326

Such KML can be viewed for example in Google Earth (you can use the original KML downloaded from USGS just as a cross check for the output data):

Note

More examples of extracting the spatial data from different formats are addressed in the ETL chapter.