Book Image

PostgreSQL 10 Administration Cookbook - Fourth Edition

Book Image

PostgreSQL 10 Administration Cookbook - Fourth Edition

Overview of this book

PostgreSQL is a powerful, open source database management system with an enviable reputation for high performance and stability. With many new features in its arsenal, PostgreSQL 10 allows users to scale up their PostgreSQL infrastructure. This book takes a step-by-step, recipe-based approach to effective PostgreSQL administration. Throughout this book, you will be introduced to these new features such as logical replication, native table partitioning, additional query parallelism, and much more. You will learn how to tackle a variety of problems that are basically the pain points for any database administrator - from creating tables to managing views, from improving performance to securing your database. More importantly, the book pays special attention to topics such as monitoring roles, backup, and recovery of your PostgreSQL 10 database, ensuring high availability, concurrency, and replication. By the end of this book, you will know everything you need to know to be the go-to PostgreSQL expert in your organization.
Table of Contents (14 chapters)

Identifying and removing duplicates

Relational databases work on the idea that items of data can be uniquely identified. However hard we try, there will always be bad data arriving from somewhere. This recipe shows you how to diagnose that and clean up the mess.

Getting ready

Let's start by looking at our example table, cust. It has a duplicate value in customerid:

postgres=# SELECT * FROM cust;
customerid | firstname | lastname | age
------------+-----------+----------+-----
1 | Philip | Marlowe | 38
2 | Richard | Hannay | 42
3 | Holly | Martins | 25
4 | Harry | Palmer | 36
4 | Mark | Hall | 47
(5 rows)

Before you delete duplicate data, remember...