There are a couple of alternative approaches to importing a Comma Separated Values (CSV) file, which stores attributes and geometries in PostGIS. In this recipe, we will use the approach of importing such a file using the PostgreSQL COPY
command and a couple of PostGIS functions.
We will import the firenews.csv
file that stores a series of web news collected from various RSS feeds related to forest fires in Europe in the context of the European Forest Fire Information System (EFFIS), available at http://effis.jrc.ec.europa.eu/.
For each news feed, there are attributes such as place name
, size
of the fire in hectares, URL
, and so on. Most importantly, there are the x
and y
fields that give the position of the geolocalized news in decimal degrees (in the WGS 84 spatial reference system, SRID = 4326).
For Windows machines, it is necessary to install OSGeo4W, a set of open source geographical libraries that will allow the manipulation of the datasets. The link is: https://trac.osgeo.org/osgeo4w/
In addition, include the OSGeo4W and the Postgres binary folders in the Path environment variable to be able to execute the commands from any location in your PC.
The steps you need to follow to complete this recipe are as shown:
- Inspect the structure of the CSV file,
firenews.csv
, which you can find within the book dataset (if you are on Windows, open the CSV file with an editor such as Notepad).
$ cd ~/postgis_cookbook/data/chp01/ $ head -n 5 firenews.csv
The output of the preceding command is as shown:
- Connect to PostgreSQL, create the
chp01 SCHEMA
, and create the following table:
$ psql -U me -d postgis_cookbook
postgis_cookbook=> CREATE EXTENSION postgis;
postgis_cookbook=> CREATE SCHEMA chp01;
postgis_cookbook=> CREATE TABLE chp01.firenews
(
x float8,
y float8,
place varchar(100),
size float8,
update date,
startdate date,
enddate date,
title varchar(255),
url varchar(255),
the_geom geometry(POINT, 4326)
);
Note
We are using the psql
client for connecting to PostgreSQL, but you can use your favorite one, for example, pgAdmin
.
Using the psql
client, we will not show the host and port options as we will assume that you are using a local PostgreSQL installation on the standard port.
If that is not the case, please provide those options!
postgis_cookbook=> COPY chp01.firenews ( x, y, place, size, update, startdate, enddate, title, url ) FROM '/tmp/firenews.csv' WITH CSV HEADER;
Note
Make sure that the firenews.csv
file is in a location accessible from the PostgreSQL process user. For example, in Linux, copy the file to the /tmp
directory.
If you are on Windows, you most likely will need to set the encoding to UTF-8 before copying: postgis_cookbook=# set client_encoding to 'UTF-8'
; and remember to set the full path, 'c:\\tmp\firenews.csv'
.
- Check all of the records have been imported from the CSV file to the PostgreSQL table:
postgis_cookbook=> SELECT COUNT(*) FROM chp01.firenews;
The output of the preceding command is as follows:
- Check a record related to this new table is in the PostGIS
geometry_columns
metadata view:
postgis_cookbook=# SELECT f_table_name, f_geometry_column, coord_dimension, srid, type FROM geometry_columns where f_table_name = 'firenews';
The output of the preceding command is as follows:
Note
Before PostGIS 2.0, you had to create a table containing spatial data in two distinct steps; in fact, the geometry_columns
view was a table that needed to be manually updated. For that purpose, you had to use the AddGeometryColumn
function to create the column. For example, this is for this recipe:postgis_cookbook=> CREATE TABLE chp01.firenews(
x float8,
y float8,
place varchar(100),
size float8,
update date,
startdate date,
enddate date,
title varchar(255),
url varchar(255))
WITHOUT OIDS;postgis_cookbook=> SELECT AddGeometryColumn('chp01', 'firenews', 'the_geom', 4326, 'POINT', 2);
chp01.firenews.the_geom SRID:4326 TYPE:POINT DIMS:2
In PostGIS 2.0, you can still use the AddGeometryColumn
function if you wish; however, you need to set its use_typmod
parameter to false
.
postgis_cookbook=> UPDATE chp01.firenews SET the_geom = ST_SetSRID(ST_MakePoint(x,y), 4326); postgis_cookbook=> UPDATE chp01.firenews SET the_geom = ST_PointFromText('POINT(' || x || ' ' || y || ')', 4326);
- Check how the geometry field has been updated in some records from the table:
postgis_cookbook=# SELECT place, ST_AsText(the_geom) AS wkt_geom FROM chp01.firenews ORDER BY place LIMIT 5;
The output of the preceding comment is as follows:
- Finally, create a spatial index for the geometric column of the table:
postgis_cookbook=> CREATE INDEX idx_firenews_geom ON chp01.firenews USING GIST (the_geom);
This recipe showed you how to load nonspatial tabular data (in CSV format) in PostGIS using the COPY
PostgreSQL command.
After creating the table and copying the CSV file rows to the PostgreSQL table, you updated the geometric column using one of the geometry constructor functions that PostGIS provides (ST_MakePoint
and ST_PointFromText
for bi-dimensional points).
These geometry constructors (in this case, ST_MakePoint
and ST_PointFromText
) must always provide the spatial reference system identifier (SRID) together with the point coordinates to define the point geometry.
Each geometric field added in any table in the database is tracked with a record in the geometry_columns
PostGIS metadata view. In the previous PostGIS version (< 2.0), the geometry_fields
view was a table and needed to be manually updated, possibly with the convenient AddGeometryColumn
function.
For the same reason, to maintain the updated geometry_columns
view when dropping a geometry column or removing a spatial table in the previous PostGIS versions, there were the DropGeometryColumn
and DropGeometryTable
functions. With PostGIS 2.0 and newer, you don't need to use these functions any more, but you can safely remove the column or the table with the standard ALTER TABLE
, DROP COLUMN
, and DROP TABLE
SQL commands.
In the last step of the recipe, you have created a spatial index on the table to improve performance. Please be aware that as in the case of alphanumerical database fields, indexes improve performances only when reading data using the SELECT
command. In this case, you are making a number of updates on the table (INSERT
, UPDATE
, and DELETE
); depending on the scenario, it could be less time consuming to drop and recreate the index after the updates.