PostgreSQL High Availability Cookbook

PostgreSQL High Availability Cookbook - Second Edition

By : Shaun Thomas

Buy this Book

PostgreSQL High Availability Cookbook - Second Edition

By: Shaun Thomas

Buy this Book

Overview of this book

Databases are nothing without the data they store. In the event of a failure - catastrophic or otherwise - immediate recovery is essential. By carefully combining multiple servers, it’s even possible to hide the fact a failure occurred at all. From hardware selection to software stacks and horizontal scalability, this book will help you build a versatile PostgreSQL cluster that will survive crashes, resist data corruption, and grow smoothly with customer demand. It all begins with hardware selection for the skeleton of an efficient PostgreSQL database cluster. Then it’s on to preventing downtime as well as troubleshooting some real life problems that administrators commonly face. Next, we add database monitoring to the stack, using collectd, Nagios, and Graphite. And no stack is complete without replication using multiple internal and external tools, including the newly released pglogical extension. Pacemaker or Raft consensus tools are the final piece to grant the cluster the ability to heal itself. We even round off by tackling the complex problem of data scalability. This book exploits many new features introduced in PostgreSQL 9.6 to make the database more efficient and adaptive, and most importantly, keep it running.

Title Page

Credits

About the Author

About the Reviewer

www.Packtpub.com

Customer Feedback

Preface

Free Chapter

Hardware Planning

Introduction

Planning for redundancy

Making the most of memory

Exploring nimble networking

Managing motherboards

Handling and Avoiding Downtime

Introduction

Determining acceptable losses

Configuration - getting it right the first time

Configuration - managing scary settings

Identifying important tables

Defusing cache poisoning

Exploring the magic of virtual IPs

Terminating rogue connections

Reducing contention with concurrent indexes

Managing system migrations

Managing software upgrades

Mitigating the impact of hardware failure

Applying bonus kernel tweaks

Pooling Resources

Introduction

Determining connection costs and limits

Installing PgBouncer

Configuring PgBouncer safely

Connecting to PgBouncer

Listing PgBouncer server connections

Listing PgBouncer client connections

Evaluating PgBouncer pool health

Installing pgpool

Configuring pgpool for master/slave mode

Testing a write query on pgpool

Swapping active nodes with pgpool

Combining the power of PgBouncer and pgpool

Troubleshooting

Introduction

Performing triage

Installing common statistics packages

Evaluating the current disk performance with iostat

Tracking I/O-heavy processes with iotop

Viewing past performance with sar

Correlating performance with dstat

Interpreting /proc/meminfo

Examining /proc/net/bonding/bond0

Checking the pg_stat_activity view

Checking the pg_stat_statements view

Deciphering database locks

Debugging with strace

Logging checkpoints properly

Monitoring

Introduction

Figuring out what to monitor

Installing and configuring Nagios

Configuring Nagios to monitor a database host

Enhancing Nagios with check_mk

Getting to know check_postgres

Installing and configuring collectd

Adding a custom PostgreSQL monitor to collectd

Installing and configuring Graphite

Adding collectd data to Graphite

Building a graph in Graphite

Customizing a Graphite graph

Creating a Graphite dashboard

Replication

Introduction

Deciding what to copy

Securing the WAL stream

Setting up a hot standby

Upgrading to asynchronous replication

Bulletproofing with synchronous replication

Faking replication with pg_receivexlog

Setting up Slony

Copying a few tables with Slony

Setting up Bucardo

Copying a few tables with Bucardo

Setting up Londiste

Copying a few tables with Londiste

Setting up pglogical

Copying a few tables with pglogical

Replication Management Tools

Introduction

Deciding when to use third-party tools

Installing and configuring Barman

Backing up a database with Barman

Restoring a database with Barman

Installing and configuring OmniPITR

Managing WAL files with OmniPITR

Installing and configuring repmgr

Cloning a database with repmgr

Swapping active nodes with repmgr

Installing and configuring walctl

Cloning a database with walctl

Managing WAL files with walctl

Installing and configuring WAL-E

Managing WAL files with WAL-E

Simple Stack

Introduction

Preparing systems for the stack

Installing and configuring etcd

Installing and configuring Patroni

Installing and configuring HAProxy

Performing a managed failover

Using an outage to test availability

Adding a node back into the cluster

Adding additional nodes to the mix

Replacing etcd with ZooKeeper

Replacing etcd with Consul

Upgrading while staying online

Advanced Stack

Introduction

Preparing systems for the stack

Getting started with the Linux Volume Manager

Adding block-level replication

Incorporating the second LVM layer

Verifying a DRBD filesystem

Correcting a DRBD split brain

Formatting an XFS filesystem

Tweaking XFS performance

Maintaining an XFS filesystem

Using LVM snapshots

Switching live stack systems

Detaching a problematic node

Cluster Control

Introduction

Installing the necessary components

Configuring Corosync

Preparing startup services

Starting with base options

Adding DRBD to cluster management

Adding LVM to cluster management

Adding XFS to cluster management

Adding PostgreSQL to cluster management

Adding a virtual IP to hide the cluster

Adding an e-mail alert

Grouping associated resources

Combining and ordering related actions

Performing a managed resource migration

Using an outage to test migration

Data Distribution

Introduction

Identifying horizontal candidates

Setting up a foreign PostgreSQL server

Mapping a remote user

Creating a foreign table

Using a foreign table in a query

Optimizing foreign table access

Transforming foreign tables into local tables

Creating a scalable nextval replacement

Building a sharding API

Talking to the right shard

Moving a shard to another server

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Saddling up to a SAN

SAN stands for Storage Area Network. Working in the industry, you may have encountered NAS (Network Attached Storage) as well. How exactly is that different, and how is it relevant to us?

It's subtle, but important. While both introduce networked storage, only a SAN grants direct block-level access, as if the allocation were raw, unformatted disk space. NAS systems operate one level higher, providing a fully formatted filesystem such as NFS or CIFS. This means our PostgreSQL database does not have direct control over the filesystem; locks, flushes, allocation, and read cache management are all controlled by a remote server.

When building a highly-available server, raw I/O and synchronization messages are very important, and NFS is more for sharing storage than extending the storage capabilities of a server. So what must we consider when deciding on how to best utilize a SAN, and when should we do this instead of using a cheaper solution such as direct attached storage?

We won't be discussing how to evaluate a SAN, which vendors produce the best hardware, or even basic configuration strategies. There are several entire books dedicated to SAN management and evaluation that are far beyond the scope of our overview. For building a highly-available PostgreSQL architecture, all we need to consider is the when and why, not the how.

Getting ready

Because we're going to cover both SAN performance and storage allocation, we recommend referring to the Having enough IOPS and Sizing storage recipes. Just like physical disks, we need to know how much space we need, and roughly how fast it should be to fulfill our transaction and query requirements.

Do we need a SAN? We can ask ourselves a few questions:

Do our IOPS or storage requirements demand more than 20 hard drives?
Will the size of our database reach or exceed 3TB within the next three years?
Would the risk to the company be too high if we ever ran out of space?
Is there already a SAN available for testing?

If we answer yes to any of these, a SAN might be in our best interests. In that case, we can determine if it would fulfill our needs.

How to do it...

Follow these steps if possible:

Request a LUN from the infrastructure department with the necessary IOPS and storage requirements.
If a SAN isn't available, many SAN vendors will provide testing equipment to encourage purchase. Try to obtain one of these.
Have the infrastructure department format the allocation and attach it to a testing server. Keep note of the path to the storage.
Create a basic PostgreSQL testing database with the following command-line operations as the postgres user:

        createdb pgbench 
        pgbench -i -s 4000 pgbench

Drop the system caches as a user capable of performing root-level commands, as follows:

        echo 3 | sudo tee /proc/sys/vm/drop_caches

Test the storage read IOPS with one final command as the postgres user:

        pgbench -S -c 24 -T 600 -j 2 pgbench

How it works...

The first part of our process is to decide whether or not we actually need a SAN at all. If the database will remain relatively small, capable of residing easily on local hard drives for several years, we don't need a SAN just yet.

While it might seem arbitrary, setting 3 TB as a cutoff for local storage comes with a few justifications. First, consider the local drives. Even if they were capable of saturating a 6 Gbps disk controller, 3 TB would require over an hour to transfer to another local storage device. If that wasn't a bottleneck, there is still the network. With a 10 Gbps NIC and assuming no overhead, that's 40 minutes of transfer at full speed.

That directly affects speed of backups, synchronization, emergency data restores, and any number of other critical operations. Some RAID cards also require special configuration when handling over 4 TB of storage, out of which 3 TB is uncomfortably close if we ever need an extension. SAN devices can perform local storage snapshots for nearly instant data copies intended for other servers. If the other server also uses the same SAN, there's no transfer overhead.

And lastly, while RAID devices can be extended when online, there is a limit imposed by how many local disks are available to our server, either directly in the chassis, or from direct attached storage extensions. If there's ever any risk we can reach that maximum, SAN devices do not have any of these inherent limitations, which we can use to our advantage.

If a SAN is ever available for testing, we're still not done. Depending on the speed of configuration of the SAN or the storage allocation itself, performance may not be sufficient, so we should test the claims made by the SAN manufacturer before committing all of our storage to it.

A very easy way to do this is with a basic pgbench test. The pgbench command is provided by the PostgreSQL software, and it can test various aspects of a server. For our uses, we want to focus on the disk storage. We start by creating a new pgbench database with createdb, so the pgbench command has somewhere to store its test data. The -i option to pgbench tells it to initialize new test data, and the -s option describes the scale of test data we want.

A scale of 4000 creates a database roughly 60 GB in size. Feel free to adjust this scale to be larger than the amount of available RAM, which guarantees that the server cannot cache all of the test data and taint our performance results by inflating the numbers.

After initializing a new test database, there is a Linux command that can instruct the server to drop all available cached data. This means none of our test data is in memory before we start the benchmark. Again, we don't want to inflate our results, otherwise the SAN looks more capable than it really is.

The test itself comes from pgbench again, which is instructed to only read the test data with the -S option. Furthermore, we tell the benchmark to launch 24 clients with the -c parameter, and to run the test for ten full minutes with the -T option. While we used 24 clients here, consider any amount up to three times the number of available processor cores. The final -j flag merely launches two concurrent benchmark threads, preventing the test itself from reducing overall performance due to CPU throttling.

This process should reveal how capable the SAN is, and if our production database will be safe and have good performance while relying on remote storage.

There's more...

Notice how we never ask for a specific number of disks when requesting a SAN allocation. Modern SAN equipment operates on an implied service level agreement based on installed components. In effect, if we need 6,000 IOPS and 10 TB of space, the SAN will combine disks, cache, and even SSDs if necessary, to match those numbers as closely as possible.

This not only reduces the amount of risky micromanagement we perform as DBAs, but it acts as an abstraction layer between storage and server. In this case, storage can be modified any number of ways, enhanced, adjusted, or copied, without affecting the database installation itself.

The main problem we encounter when using a SAN instead of several servers configured with local storage, is that the SAN becomes a single point of failure. This is something to keep in mind as our journey to high availability progresses.

PostgreSQL High Availability Cookbook - Second Edition

By : Shaun Thomas

PostgreSQL High Availability Cookbook - Second Edition

By: Shaun Thomas

Overview of this book

Related Content you might be interested in

Current Title:

PostgreSQL High Availability Cookbook - Second Edition

Real-World Next.js

Isomorphic JavaScript Web Development

Build Applications with Meteor

Saddling up to a SAN

Getting ready

How to do it...

How it works...

There's more...

See also