ogr2ogr
is the GDAL's vector transform utility. It is - not without reason - considered a Swiss Army knife for vector transformations. Despite its size, ogr2ogr
can handle a wide range of formats and this makes it a really worthy tool.
We'll use ogr2ogr
to import a few data formats other than SHP, although ogr2ogr
can obviously import SHP too. For this scenario, we'll use some data downloaded earlier:
- OS GB address base in GML format
- OS GB code point polygons in MapInof MIF & TAB formats
- USGS earthquakes in KML format
Some of the most common ogr2ogr params are:
-f
: The format of the output (when importing to PostGIS it will be PostgreSQL).-nln
: Assigns a name to the layer. In the case of importing the data to PostGIS this will be the table name.-select
: Lets you specify a comma separated list of columns to pick.-where
: Lets you specify a sql like query to filter out the data.append
: Appends data to the output dataset.overwrite
: Overwrites the output datasource - in case of PostgreSQL it will drop and re-create a table.s_srs
: Specifies the input SRID.t_srs
: Transforms coordinates to the specified SRID.a_srs
: Specifies the output SRID.lco NAME=VALUE
: Layer creation options - these are driver specific; for pgsql options, see http://www.gdal.org/drv_pg.html. The most commonly used layer creation options are:LAUNDER
: This defaults toYES
. It is responsible for converting column names into pgsql compatible ones (lower case, underscores).PRECISION
: This defaults toYES
. It is responsible for using numeric and char types over float and varchar.GEOMETRY_NAME
: Defaults towkb_geometry
.
Note
For a full list of ogr2ogr
params, just type ogr2ogr.
Ogr2ogr
has an accompanying utility called ogrinfo. This tool lets one inspect the metadata of a dataset. Verifying the metadata of any dataset prior to working with it is considered good practice and one should get into the habit of always using it before importing or exporting the data.
Let's start with importing the GML of the OS GB address base. First we'll see what data we're dealing with exactly:
ogrinfo sx9090.gml
The following should be the output:
Had to open data source read-only. INFO: Open of `sx9090.gml' using driver `GML' successful. 1: Address (Point)
We can then review the layer information:
ogrinfo sx9090.gml Address -so
You should see a similar output:
Had to open data source read-only. INFO: Open of `sx9090.gml' using driver `GML' successful. Layer name: Address Geometry: Point Feature Count: 42861 Extent: (-3.560100, 50.699470) - (-3.488340, 50.744770) Layer SRS WKT: GEOGCS["ETRS89", DATUM["European_Terrestrial_Reference_System_1989", SPHEROID["GRS 1980",6378137,298.257222101, AUTHORITY["EPSG","7019"]], TOWGS84[0,0,0,0,0,0,0], AUTHORITY["EPSG","6258"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4258"]] gml_id: String (0.0) uprn: Real (0.0) osAddressTOID: String (20.0) udprn: Integer (0.0) subBuildingName: String (25.0) buildingName: String (36.0) thoroughfare: String (27.0) postTown: String (6.0) postcode: String (7.0) postcodeType: String (1.0) rpc: Integer (0.0) country: String (1.0) changeType: String (1.0) laStartDate: String (10.0) rmStartDate: String (10.0) lastUpdateDate: String (10.0) class: String (1.0) buildingNumber: Integer (0.0) dependentLocality: String (27.0) organisationName: String (55.0) dependentThoroughfare: String (27.0) poBoxNumber: Integer (0.0) doubleDependentLocality: String (21.0) departmentName: String (37.0)
Note
-so param
makes ogrinfo
display the data summary only; otherwise, info on a full dataset would be displayed.
Once we're ready to import the data, let's execute the following command:
ogr2ogr -f "PostgreSQL" PG:"host=localhost port=5434 user=postgres dbname=mastering_postgis" sx9090.gml -nln data_import.osgb_address_base_gml -geomfield geom
At this stage, the address GML should be available in our database.
Both MIF and TAB are MapInfo formats. TAB is the default format that contains formatting, while MIF is the interchange format.
We'll start with reviewing metadata:
ogrinfo EX_sample.mif
And then:
ogrinfo EX_sample.mif EX_Sample -so Had to open data source read-only. INFO: Open of `EX_sample.mif' using driver `MapInfo File' successful. Layer name: EX_sample Geometry: Unknown (any) Feature Count: 4142 Extent: (281282.800000, 85614.570000) - (300012.000000, 100272.000000) Layer SRS WKT: PROJCS["unnamed", GEOGCS["unnamed", DATUM["OSGB_1936", SPHEROID["Airy 1930",6377563.396,299.3249646], TOWGS84[375,-111,431,-0,-0,-0,0]], PRIMEM["Greenwich",0], UNIT["degree",0.0174532925199433]], PROJECTION["Transverse_Mercator"], PARAMETER["latitude_of_origin",49], PARAMETER["central_meridian",-2], PARAMETER["scale_factor",0.9996012717], PARAMETER["false_easting",400000], PARAMETER["false_northing",-100000], UNIT["Meter",1]] POSTCODE: String (8.0) UPP: String (20.0) PC_AREA: String (2.0)
Note
Please note that ogrinfo projection metadata for our MIF file does not specify the EPSG code. This is fine, as the projection definition is present. But it will result in ogr2ogr creating a new entry in the spatial_ref_sys
, which is not too good, as we'll end up with the wrong coordsys
identifiers; the coordinate reference id will be the next available. This is because ogr2ogr
expands the coordinate reference into a WKT string and then does a string comparison against the coordsys
identifiers definitions in the spatial_ref_sys table
; minor differences in formatting or precision will result in ogr2ogr
failing to match coordsys
. In such a scenario, a new entry will be created; for example, if you happen to use the EPSG:3857 coordinate system and the system's definition is slightly different and cannot be matched, the assigned SRID will not be 3857, but the next available ID will be chosen. A solution to this is to specify the exact coordinate system; ogr2ogr
should output the data via the a_srs
parameter.
Once ready, we can import the data:
ogr2ogr -f "PostgreSQL" PG:"host=localhost port=5434 user=postgres dbname=mastering_postgis" EX_sample.mif -nln data_import.osgb_code_point_polygons_mif -lco GEOMETRY_NAME=geom -a_srs EPSG:27700
If you followed the very same procedure for TAB file and loaded the data, both datasets are now in their own tables in the data_import
schema:
ogr2ogr -f "PostgreSQL" PG:"host=localhost port=5434 user=postgres dbname=mastering_postgis" EX_sample.tab -nln data_import.osgb_code_point_polygons_tab -lco GEOMETRY_NAME=geom -a_srs EPSG:27700
As usual, we'll start with the dataset's metadata checkup:
ogrinfo 2.5_day_age.kml
The output shows that there is more than one layer:
INFO: Open of `2.5_day_age.kml' using driver `LIBKML' successful. 1: Magnitude 5 2: Magnitude 4 3: Magnitude 3 4: Magnitude 2
Therefore, in order to review metadata for each layer at once, the following command should be used:
ogrinfo 2.5_day_age.kml -al -so
The output of the previous command is rather longish, so we'll truncate it a bit and only show the info for the first layer:
INFO: Open of `2.5_day_age.kml' using driver `LIBKML' successful. Layer name: Magnitude 5 Geometry: Unknown (any) Feature Count: 2 Extent: (-101.000100, -36.056300) - (120.706400, 13.588200) Layer SRS WKT: GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], TOWGS84[0,0,0,0,0,0,0], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9108"]], AUTHORITY["EPSG","4326"]] Name: String (0.0) description: String (0.0) timestamp: DateTime (0.0) begin: DateTime (0.0) end: DateTime (0.0) altitudeMode: String (0.0) tessellate: Integer (0.0) extrude: Integer (0.0) visibility: Integer (0.0) drawOrder: Integer (0.0) icon: String (0.0) snippet: String (0.0)
This time, EPSG information is available, so we do not have to worry; ogr2ogr
will create extra SRID definition in the database.
Once we've confirmed that this is the exact dataset we'd like to import, we can continue with the following command:
ogr2ogr -f "PostgreSQL" PG:"host=localhost port=5434 user=postgres dbname=mastering_postgis" 2.5_day_age.kml -nln data_import.usgs_earthquakes_kml -lco GEOMETRY_NAME=geom -append
Note
Note the append param in the command earlier. This is required because our KML has more than one layer and ogr2ogr
is importing them one by one. Without the append param, only the first layer would be imported and then ogr2ogr
would fail with a similar output:FAILED: Layer data_import.usgs_earthquakes_kml already exists, and -append not specified.
Consider using -append, or -overwrite.
ERROR 1: Terminating translation prematurely after failed
translation of layer Magnitude 4 (use -skipfailures to skip errors)
The cmd output should be similar to:
WARNING: Layer creation options ignored since an existing layer is being appended to. WARNING: Layer creation options ignored since an existing layer is being appended to. WARNING: Layer creation options ignored since an existing layer is being appended to.
At this stage, the KML dataset should have made it to our PostGIS database.
For those preferring GUI over CMD, there is an alternative to plain old ogr2ogr
--ogr2gui
available from http://www.ogr2gui.ca/.
Simply download the required archive, extract it, and launch the appropriate .exe
. After having played with ogr2ogr
a bit already, the GUI should be rather self-explanatory: