Book Image

PostgreSQL 9 Administration Cookbook - Second Edition

Book Image

PostgreSQL 9 Administration Cookbook - Second Edition

Overview of this book

Table of Contents (19 chapters)
PostgreSQL 9 Administration Cookbook Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Identifying and removing duplicates


Relational databases work on the idea that items of data can be uniquely identified. However hard we try, there will always be bad data arriving from somewhere. This recipe shows you how to diagnose that and clean up the mess.

Getting ready

Let's start by looking at our example table, cust. It has a duplicate value in customerid:

postgres=# SELECT * FROM cust;
 customerid | firstname | lastname | age
------------+-----------+----------+-----
          1 | Philip    | Marlowe  |  38
          2 | Richard   | Hannay   |  42
          3 | Holly     | Martins  |  25
          4 | Harry     | Palmer   |  36
          4 | Mark      | Hall     |  47
(5 rows)

Before you delete duplicate data, remember that sometimes, it isn't the data that is wrong; it is your understanding of it. In those cases, it may be that you haven't properly normalized your database model, and that you need to include additional tables to account for the shape of the data. You might also...