Book Image

PostgreSQL Administration Cookbook, 9.5/9.6 Edition - Third Edition

Book Image

PostgreSQL Administration Cookbook, 9.5/9.6 Edition - Third Edition

Overview of this book

PostgreSQL is a powerful opensource database management system; now recognized as the expert's choice for a wide range of applications, it has an enviable reputation for performance and stability. PostgreSQL provides an integrated feature set comprising relational database features, object-relational, text search, Geographical Info Systems, analytical tools for big data and JSON/XML document management. Starting with short and simple recipes, you will soon dive into core features, such as configuration, server control, tables, and data. You will tackle a variety of problems a database administrator usually encounters, from creating tables to managing views, from improving performance to securing your database, and from using monitoring tools to using storage engines. Recipes based on important topics such as high availability, concurrency, replication, backup and recovery, as well as diagnostics and troubleshooting are also given special importance. By the end of this book, you will have all the knowledge you need to run, manage, and maintain PostgreSQL efficiently.
Table of Contents (13 chapters)

Identifying and removing duplicates

Relational databases work on the idea that items of data can be uniquely identified. However hard we try, there will always be bad data arriving from somewhere. This recipe shows you how to diagnose that and clean up the mess.

Getting ready

Let's start by looking at our example table, cust. It has a duplicate value in customerid:

postgres=# SELECT * FROM cust;
customerid | firstname | lastname | age
------------+-----------+----------+-----
1 | Philip | Marlowe | 38
2 | Richard | Hannay | 42
3 | Holly | Martins | 25
4 | Harry | Palmer | 36
4 | Mark | Hall | 47
(5 rows)

Before you delete duplicate data, remember...