PostgreSQL 12 High Availability Cookbook - Third Edition

By : Shaun Thomas

PostgreSQL 12 High Availability Cookbook - Third Edition

By: Shaun Thomas

Overview of this book

Databases are nothing without the data they store. In the event of an outage or technical catastrophe, immediate recovery is essential. This updated edition ensures that you will learn the important concepts related to node architecture design, as well as techniques such as using repmgr for failover automation. From cluster layout and hardware selection to software stacks and horizontal scalability, this PostgreSQL cookbook will help you build a PostgreSQL cluster that will survive crashes, resist data corruption, and grow smoothly with customer demand. You’ll start by understanding how to plan a PostgreSQL database architecture that is resistant to outages and scalable, as it is the scaffolding on which everything rests. With the bedrock established, you'll cover the topics that PostgreSQL database administrators need to know to manage a highly available cluster. This includes configuration, troubleshooting, monitoring and alerting, backups through proxies, failover automation, and other considerations that are essential for a healthy PostgreSQL cluster. Later, you’ll learn to use multi-master replication to maximize server availability. Later chapters will guide you through managing major version upgrades without downtime. By the end of this book, you’ll have learned how to build an efficient and adaptive PostgreSQL 12 database cluster.

Preface

Who this book is for

What this book covers

To get the most out of this book

Sections

Get in touch

Architectural Considerations

Setting expectations with RPO

Defining timetables through RTO

Picking redundant copies

Selecting locations

Having enough backups

Considering quorum

Introducing indirection

Preventing split brain

Incorporating multi-master

Leveraging multi-master

Free Chapter

Hardware Planning

Planning for redundancy

Allocating enough memory

Exploring nimble networking

Managing motherboards

Minimizing Downtime

Determining acceptable losses

Configuration – getting it right the first time

Configuration – managing scary settings

Identifying important tables

Defusing cache poisoning

Terminating rogue connections

Reducing contention with concurrent indexes

Managing system migrations

Managing software upgrades

Mitigating the impact of hardware failure

Applying bonus kernel tweaks

Proxy and Pooling Resources

Exploring the magic of virtual IPs

Obtaining and installing HAProxy

Configuring HAProxy to load balance PostgreSQL

Determining connection costs and limits

Installing PgBouncer

Configuring PgBouncer safely

Connecting to PgBouncer

Listing PgBouncer server connections

Listing PgBouncer client connections

Evaluating PgBouncer pool health

Changing PgBouncer connections while online

Enhancing PgBouncer authentication

Troubleshooting

Performing triage

Installing common statistics packages

Evaluating the current disk performance with iostat

Tracking I/O-heavy processes with iotop

Viewing past performance with sar

Correlating performance with dstat

Interpreting /proc/meminfo

Examining /proc/net/bonding/bond0

Checking the pg_stat_activity view

Checking the pg_stat_statements view

Deciphering database locks

Debugging with strace

Logging checkpoints properly

Monitoring

Figuring out what to monitor

Installing and configuring Nagios

Configuring Nagios to monitor a database host

Enhancing Nagios with Check_MK

Getting to know check_postgres

Installing and configuring Telegraf

Adding a custom PostgreSQL monitor to Telegraf

Installing and configuring InfluxDB

Installing and configuring Grafana

Building a graph in Grafana

Customizing a Grafana graph

Using InfluxDB tags in Grafana

PostgreSQL Replication

Deciding what to copy

Securing the WAL stream

Setting up a hot standby

Upgrading to asynchronous replication

Bulletproofing with synchronous replication

Faking replication with pg_receivewal

Setting up Slony

Copying a few tables with Slony

Setting up Bucardo

Copying a few tables with Bucardo

Setting up pglogical

Copying a few tables with pglogical

Copying a few tables with native logical replication

Backup Management

Deciding when to use third-party tools

Installing and configuring Barman

Backing up a database with Barman

Restoring a database with Barman

Obtaining Barman diagnostics and information

Sending Barman backups to a remote location

Installing and configuring pgBackRest

Backing up a database with pgBackRest

Restoring a database with pgBackRest

Installing and configuring WAL-E

Managing WAL files with WAL-E

High Availability with repmgr

Preparing systems for repmgr

Installing and configuring repmgr

Cloning a database with repmgr

Incorporating a repmgr witness

Performing a managed failover

Customizing the failover process

Using an outage to test availability

Returning a node to the cluster

Integrating primary fencing

Performing online maintenance and upgrades

High Availability with Patroni

Understanding more about Patroni and its components

Preparing systems for the stack

Installing and configuring etcd

Installing and configuring Patroni

Installing and configuring HAProxy

Performing a managed switchover

Using an outage to test availability

Returning a node to the cluster

Adding additional nodes to the mix

Replacing etcd with ZooKeeper

Replacing etcd with Consul

Upgrading while staying online

Low-Level Server Mirroring

Understanding our chosen filesystem components

Preparing systems for volume mirroring

Getting started with the LVM

Adding block-level replication

Incorporating the second LVM layer

Verifying a DRBD filesystem

Correcting a DRBD split brain

Formatting an XFS filesystem

Tweaking XFS performance

Maintaining an XFS filesystem

Using LVM snapshots

Switching live stack systems

Detaching a problematic node

High Availability via Pacemaker

Before we begin...

Installing the components

Configuring Corosync

Preparing start up services

Starting with base options

Adding DRBD to cluster management

Adding LVM to cluster management

Adding XFS to cluster management

Adding PostgreSQL to cluster management

Adding a virtual IP to proxy the cluster

Adding an email alert

Grouping associated resources

Combining and ordering related actions

Performing a managed resource migration

Using an outage to test migration

High Availability with Multi-Master Replication

Overview of multi-master

Deciding whether multi-master is right for you

Obtaining and installing BDR

Starting with a single BDR node

Creating an additional BDR node

Testing DDL replication on each node

Using sequences safely

Configuring HAProxy for the multi-master approach

Combining PgBouncer with HAProxy

Performing a managed node switchover

Improving failover speed

Performing a major version upgrade online

Data Distribution

Identifying horizontal candidates

Setting up a foreign PostgreSQL server

Mapping a remote user

Creating a foreign table

Using a foreign table in a query

Optimizing foreign table access

Transforming foreign tables into local tables

Creating a scalable nextval replacement

Building a sharding API

Talking to the correct shard

Moving a shard to another server

Zero-downtime Upgrades

Preparing upgrade requirements

Remembering PgBouncer and pglogical

Creating a publication set

Handling sequences

Bootstrapping the target cluster

Starting the subscription

Monitoring progress

Switching targets

Cleaning everything up

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Considering quorum

Quorum can best be explained by imagining any voting system. It's a result of trusted consensus and relies on multiple implementations backed by dissertation and quantitative study. The most common way to guarantee a quorum for a PostgreSQL cluster is by utilizing a witness node. This exists only to vote and observe the state of the cluster. This helps us reach maximum availability by guaranteeing there's always an active primary node.

In this recipe, we'll examine why it's important to apply the concept of quorum to our PostgreSQL cluster, and how we may do so.

Getting ready

The primary criteria for establishing a quorum is that we must satisfy the capability for avoiding tie votes, also known as establishing consensus. Basically, this means we must have an odd number of PostgreSQL nodes within our cluster such that there's always a majority. We should already have a preliminary node count by working through previous recipes in this chapter, in particular, the Picking redundant copies recipe and the Selecting locations recipe.

That being said, the concept of quorum is only necessary in clusters that intend to provide automated failover capabilities. If this is not going to be a feature of the end architecture, this recipe may be skipped.

How to do it...

Once we have an initial node count, we should apply these guidelines to adjust the total count and node distribution:

If the initial PostgreSQL node count is even, add one witness node.
If the initial PostgreSQL node count is odd, convert one replica into a witness node.
In the presence of two locations, the witness node should reside in the same data center as the primary node.
If possible, allocate witness nodes in an independent tertiary location.

How it works...

While deceptively simple, there's actually a lot of thought involved in correctly placing an odd node, and why we use witness nodes rather than yet another PostgreSQL replica:

Our first guideline is the most straightforward of these, such that we ensure there are an odd number of nodes in the cluster. Once we have that, any event in the cluster is submitted to the entire quorum for a decision, and only agreement guarantees subsequent action. Further, since the witness cannot vote for itself, only one eligible node will ever win the election. Consider this sample cluster diagram:

We have three nodes in this cluster and, in the event of a failure of the Primary node, the Witness must vote for the only remaining Replica. If the Witness had been a standard replica node, it could have voted for itself and potentially led to a tied vote. In an automated scenario, this would prevent the cluster from promoting a replacement Primary node.

The second guideline is a variant of this concept. If we already had an odd number of nodes, one of these should be a Witness rather than a standard replica. Consider this diagram:

We can see here that the third node is still a replica, but it also acts as a Witness. Essentially, we don't allow this node to vote for itself to become the new Primary. This kind of role works well for read-only replicas that exist only for application use and is a good way to reuse existing resources.

The third guideline, of placing the Witness in the same location as the Primary node, safeguards node visibility. More important than automation is safety. By placing the Witness in the same location as the Primary when there are only two data centers, we can ensure that a network partition—a situation where we lose network connectivity between the data centers—won't result in the alternate location incorrectly promoting one of its replicas. Consider this diagram:

If the connection between Chicago and Dallas is lost, Chicago still has the majority of voting nodes, and Dallas does not. As a result, the cluster will continue operating normally until the network is repaired, and we didn't experience an accidental activation of a node in Dallas.

Some failover automation systems also take physical location into account by verifying that all nodes in one location agree that all nodes in the other location are not responding. In these cases, the only time where automation will not work normally is when a network partition has occurred. This approach is only viable when more than one node exists in each location. Such can be accomplished by allocating further replicas, or even witness nodes.

Unfortunately, our cluster is no longer symmetrical. If we activate the node in Dallas, there are no witnesses in that location, so we must eventually move the Primary back to Chicago. This means every failover will be followed by a manual switch to the other location, thus doubling our downtime.

The easiest way to permanently address these concerns is to add a third location and assign a node there. In most cases, this will be the Witness node itself. Consider this example:

In this case, we may desire that only Chicago or San Jose host the active PostgreSQL node. In the event of a failure of our Primary node, San Jose should take over instead. The Witness can see both data centers and decide voting based on this. Furthermore, it doesn't matter if the Primary is active in Chicago or San Jose, because the Witness is not tied directly to either location.

There's more...

What happens in the case of a tie? Even if the original cluster contained an odd number of nodes, when the Primary node goes offline, this is no longer true. In simple quorum systems, each node votes for itself. However, a Witness, by its definition, must vote for some other node. This means some replica in the cluster will have more than one vote, and thus win the election.

In case there are somehow multiple witnesses, and votes are split anyway, PostgreSQL quorum systems usually account for the Log Sequence Number (LSN) from the Primary node. Even if it's only a single transaction, one of the nodes with the most votes will have replicated more data than the other, and this will break the tie.

PostgreSQL 12 High Availability Cookbook - Third Edition

By : Shaun Thomas

PostgreSQL 12 High Availability Cookbook - Third Edition

By: Shaun Thomas

Overview of this book

Related Content you might be interested in

Current Title:

PostgreSQL 12 High Availability Cookbook - Third Edition

PostgreSQL 13 Cookbook

PostgreSQL 11 Administration Cookbook

PostgreSQL 16 Administration Cookbook