PostgreSQL 12 High Availability Cookbook - Third Edition

By : Shaun Thomas

PostgreSQL 12 High Availability Cookbook - Third Edition

By: Shaun Thomas

Overview of this book

Databases are nothing without the data they store. In the event of an outage or technical catastrophe, immediate recovery is essential. This updated edition ensures that you will learn the important concepts related to node architecture design, as well as techniques such as using repmgr for failover automation. From cluster layout and hardware selection to software stacks and horizontal scalability, this PostgreSQL cookbook will help you build a PostgreSQL cluster that will survive crashes, resist data corruption, and grow smoothly with customer demand. You’ll start by understanding how to plan a PostgreSQL database architecture that is resistant to outages and scalable, as it is the scaffolding on which everything rests. With the bedrock established, you'll cover the topics that PostgreSQL database administrators need to know to manage a highly available cluster. This includes configuration, troubleshooting, monitoring and alerting, backups through proxies, failover automation, and other considerations that are essential for a healthy PostgreSQL cluster. Later, you’ll learn to use multi-master replication to maximize server availability. Later chapters will guide you through managing major version upgrades without downtime. By the end of this book, you’ll have learned how to build an efficient and adaptive PostgreSQL 12 database cluster.

Preface

Who this book is for

What this book covers

To get the most out of this book

Sections

Get in touch

Architectural Considerations

Setting expectations with RPO

Defining timetables through RTO

Picking redundant copies

Selecting locations

Having enough backups

Considering quorum

Introducing indirection

Preventing split brain

Incorporating multi-master

Leveraging multi-master

Free Chapter

Hardware Planning

Planning for redundancy

Allocating enough memory

Exploring nimble networking

Managing motherboards

Minimizing Downtime

Determining acceptable losses

Configuration – getting it right the first time

Configuration – managing scary settings

Identifying important tables

Defusing cache poisoning

Terminating rogue connections

Reducing contention with concurrent indexes

Managing system migrations

Managing software upgrades

Mitigating the impact of hardware failure

Applying bonus kernel tweaks

Proxy and Pooling Resources

Exploring the magic of virtual IPs

Obtaining and installing HAProxy

Configuring HAProxy to load balance PostgreSQL

Determining connection costs and limits

Installing PgBouncer

Configuring PgBouncer safely

Connecting to PgBouncer

Listing PgBouncer server connections

Listing PgBouncer client connections

Evaluating PgBouncer pool health

Changing PgBouncer connections while online

Enhancing PgBouncer authentication

Troubleshooting

Performing triage

Installing common statistics packages

Evaluating the current disk performance with iostat

Tracking I/O-heavy processes with iotop

Viewing past performance with sar

Correlating performance with dstat

Interpreting /proc/meminfo

Examining /proc/net/bonding/bond0

Checking the pg_stat_activity view

Checking the pg_stat_statements view

Deciphering database locks

Debugging with strace

Logging checkpoints properly

Monitoring

Figuring out what to monitor

Installing and configuring Nagios

Configuring Nagios to monitor a database host

Enhancing Nagios with Check_MK

Getting to know check_postgres

Installing and configuring Telegraf

Adding a custom PostgreSQL monitor to Telegraf

Installing and configuring InfluxDB

Installing and configuring Grafana

Building a graph in Grafana

Customizing a Grafana graph

Using InfluxDB tags in Grafana

PostgreSQL Replication

Deciding what to copy

Securing the WAL stream

Setting up a hot standby

Upgrading to asynchronous replication

Bulletproofing with synchronous replication

Faking replication with pg_receivewal

Setting up Slony

Copying a few tables with Slony

Setting up Bucardo

Copying a few tables with Bucardo

Setting up pglogical

Copying a few tables with pglogical

Copying a few tables with native logical replication

Backup Management

Deciding when to use third-party tools

Installing and configuring Barman

Backing up a database with Barman

Restoring a database with Barman

Obtaining Barman diagnostics and information

Sending Barman backups to a remote location

Installing and configuring pgBackRest

Backing up a database with pgBackRest

Restoring a database with pgBackRest

Installing and configuring WAL-E

Managing WAL files with WAL-E

High Availability with repmgr

Preparing systems for repmgr

Installing and configuring repmgr

Cloning a database with repmgr

Incorporating a repmgr witness

Performing a managed failover

Customizing the failover process

Using an outage to test availability

Returning a node to the cluster

Integrating primary fencing

Performing online maintenance and upgrades

High Availability with Patroni

Understanding more about Patroni and its components

Preparing systems for the stack

Installing and configuring etcd

Installing and configuring Patroni

Installing and configuring HAProxy

Performing a managed switchover

Using an outage to test availability

Returning a node to the cluster

Adding additional nodes to the mix

Replacing etcd with ZooKeeper

Replacing etcd with Consul

Upgrading while staying online

Low-Level Server Mirroring

Understanding our chosen filesystem components

Preparing systems for volume mirroring

Getting started with the LVM

Adding block-level replication

Incorporating the second LVM layer

Verifying a DRBD filesystem

Correcting a DRBD split brain

Formatting an XFS filesystem

Tweaking XFS performance

Maintaining an XFS filesystem

Using LVM snapshots

Switching live stack systems

Detaching a problematic node

High Availability via Pacemaker

Before we begin...

Installing the components

Configuring Corosync

Preparing start up services

Starting with base options

Adding DRBD to cluster management

Adding LVM to cluster management

Adding XFS to cluster management

Adding PostgreSQL to cluster management

Adding a virtual IP to proxy the cluster

Adding an email alert

Grouping associated resources

Combining and ordering related actions

Performing a managed resource migration

Using an outage to test migration

High Availability with Multi-Master Replication

Overview of multi-master

Deciding whether multi-master is right for you

Obtaining and installing BDR

Starting with a single BDR node

Creating an additional BDR node

Testing DDL replication on each node

Using sequences safely

Configuring HAProxy for the multi-master approach

Combining PgBouncer with HAProxy

Performing a managed node switchover

Improving failover speed

Performing a major version upgrade online

Data Distribution

Identifying horizontal candidates

Setting up a foreign PostgreSQL server

Mapping a remote user

Creating a foreign table

Using a foreign table in a query

Optimizing foreign table access

Transforming foreign tables into local tables

Creating a scalable nextval replacement

Building a sharding API

Talking to the correct shard

Moving a shard to another server

Zero-downtime Upgrades

Preparing upgrade requirements

Remembering PgBouncer and pglogical

Creating a publication set

Handling sequences

Bootstrapping the target cluster

Starting the subscription

Monitoring progress

Switching targets

Cleaning everything up

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Having enough backups

Database backups are a crucial component to any architecture, and should be considered a required part of the central design. The only real question in most cases is: how many backups? All highly available clusters account for relevant backup copies, lest the cluster itself is lost.

In this recipe, we'll cover one simple set of rules to provide an answer.

Getting ready

This is very important, so write it down if necessary. Put it in company documentation if possible.

How to do it...

When considering how many backups to allocate, follow the 3-2-1 backup rule, which consists of these elements:

Keep at least three copies of your data.
Store two copies on different devices.
Keep at least one copy offsite.

Take note that many things may qualify as a copy, including PostgreSQL replicas, and the original data itself.

How it works...

Notice how we don't really recommend a specific backup method, how it should be stored, or how filesystem-based features might contribute. All of those things are implementation details and don't matter as much as the rules themselves.

Consider the first rule: keep at least three copies of your data. Since our PostgreSQL instance is the first copy, we need two more. What might these two be? Could we use one replica and a backup? Maybe we could use two replicas? Perhaps, but let's examine the remaining rules first.

Imagine we've produced a PostgreSQL backup, and it's stored on our primary server and the same filesystem as the database instance. What happens if that storage device is damaged or destroyed? Now, we've lost the database and the backup.

That naturally leads to the second rule: store two copies on different devices. It's fine to retain a copy on the local PostgreSQL server and even the same physical storage device, provided we store a copy of the backup on a device that won't be lost simultaneously. Store another copy of the backup on a second physical device. This can be a separate set of storage drives, a SAN, a shared NFS filesystem, or anything else, so long as it's separate from the database itself.

Be wary of relying on shared infrastructure for following the second rule. If we have two separate LUNs from the same SAN mounted on our database server for PostgreSQL and the backup, this means nothing if the SAN itself is lost. If possible, try to ensure that the backup is actually on a physically distinct device.

So, to fulfill the second rule, we merely need to ensure that the second copy of our PostgreSQL instance is on another device. This is most easily done by creating a replica on another server, or a VM hosted on a different hypervisor. That's two copies of the data.

Finally there's rule three: keep at least one copy offsite. This is the third copy of our data, and it's best to place it somewhere that's immune from a catastrophic failure of the data center itself. In limited circumstances, it may be safe enough to place the backup on a server in another rack of the same data center, but why take the risk?

There are ample cloud providers, vault systems, and cheap storage services that can fill the role of hosting the third and final copy of our data. If we have our own second data center, that's an easy choice of venue. If not, it's important to select, allocate, and designate some tertiary location that won't be lost if the worst happens. This third data copy is an insurance policy, and it doesn't even have to be easily available. As long as we can obtain the backup upon request, that fits the minimum requirement.

There's more...

There's one important corollary here. PostgreSQL replicas tend to immediately reflect every change made to the primary node. What happens if someone accidentally drops a table? In this case, the 3-2-1 rule by itself is not sufficient. Relying on replicas alone means we've permanently lost this data.

Thus, we strongly recommend the following additional rule: At least one copy must be a true backup.

Databases such as PostgreSQL are equipped with Point-In-Time-Recovery (PITR), which allows the user to start with any past backup and apply changes until it reaches a specific point where recovery is stopped. This lets us recover a backup to the point before a table was damaged or removed, capture the desired contents, and reintroduce them into the original database. This can only be done with a real binary-level backup, and there are multiple tools dedicated to performing this task. This book even contains a chapter dedicated to Backup Management.

PostgreSQL 12 High Availability Cookbook - Third Edition

By : Shaun Thomas

PostgreSQL 12 High Availability Cookbook - Third Edition

By: Shaun Thomas

Overview of this book

Related Content you might be interested in

Current Title:

PostgreSQL 12 High Availability Cookbook - Third Edition

PostgreSQL 13 Cookbook

PostgreSQL 11 Administration Cookbook

PostgreSQL 16 Administration Cookbook