PostgreSQL High Availability Cookbook

PostgreSQL High Availability Cookbook - Second Edition

By : Shaun Thomas

Buy this Book

PostgreSQL High Availability Cookbook - Second Edition

By: Shaun Thomas

Buy this Book

Overview of this book

Databases are nothing without the data they store. In the event of a failure - catastrophic or otherwise - immediate recovery is essential. By carefully combining multiple servers, it’s even possible to hide the fact a failure occurred at all. From hardware selection to software stacks and horizontal scalability, this book will help you build a versatile PostgreSQL cluster that will survive crashes, resist data corruption, and grow smoothly with customer demand. It all begins with hardware selection for the skeleton of an efficient PostgreSQL database cluster. Then it’s on to preventing downtime as well as troubleshooting some real life problems that administrators commonly face. Next, we add database monitoring to the stack, using collectd, Nagios, and Graphite. And no stack is complete without replication using multiple internal and external tools, including the newly released pglogical extension. Pacemaker or Raft consensus tools are the final piece to grant the cluster the ability to heal itself. We even round off by tackling the complex problem of data scalability. This book exploits many new features introduced in PostgreSQL 9.6 to make the database more efficient and adaptive, and most importantly, keep it running.

Title Page

Credits

About the Author

About the Reviewer

www.Packtpub.com

Customer Feedback

Preface

Free Chapter

Hardware Planning

Introduction

Planning for redundancy

Making the most of memory

Exploring nimble networking

Managing motherboards

Handling and Avoiding Downtime

Introduction

Determining acceptable losses

Configuration - getting it right the first time

Configuration - managing scary settings

Identifying important tables

Defusing cache poisoning

Exploring the magic of virtual IPs

Terminating rogue connections

Reducing contention with concurrent indexes

Managing system migrations

Managing software upgrades

Mitigating the impact of hardware failure

Applying bonus kernel tweaks

Pooling Resources

Introduction

Determining connection costs and limits

Installing PgBouncer

Configuring PgBouncer safely

Connecting to PgBouncer

Listing PgBouncer server connections

Listing PgBouncer client connections

Evaluating PgBouncer pool health

Installing pgpool

Configuring pgpool for master/slave mode

Testing a write query on pgpool

Swapping active nodes with pgpool

Combining the power of PgBouncer and pgpool

Troubleshooting

Introduction

Performing triage

Installing common statistics packages

Evaluating the current disk performance with iostat

Tracking I/O-heavy processes with iotop

Viewing past performance with sar

Correlating performance with dstat

Interpreting /proc/meminfo

Examining /proc/net/bonding/bond0

Checking the pg_stat_activity view

Checking the pg_stat_statements view

Deciphering database locks

Debugging with strace

Logging checkpoints properly

Monitoring

Introduction

Figuring out what to monitor

Installing and configuring Nagios

Configuring Nagios to monitor a database host

Enhancing Nagios with check_mk

Getting to know check_postgres

Installing and configuring collectd

Adding a custom PostgreSQL monitor to collectd

Installing and configuring Graphite

Adding collectd data to Graphite

Building a graph in Graphite

Customizing a Graphite graph

Creating a Graphite dashboard

Replication

Introduction

Deciding what to copy

Securing the WAL stream

Setting up a hot standby

Upgrading to asynchronous replication

Bulletproofing with synchronous replication

Faking replication with pg_receivexlog

Setting up Slony

Copying a few tables with Slony

Setting up Bucardo

Copying a few tables with Bucardo

Setting up Londiste

Copying a few tables with Londiste

Setting up pglogical

Copying a few tables with pglogical

Replication Management Tools

Introduction

Deciding when to use third-party tools

Installing and configuring Barman

Backing up a database with Barman

Restoring a database with Barman

Installing and configuring OmniPITR

Managing WAL files with OmniPITR

Installing and configuring repmgr

Cloning a database with repmgr

Swapping active nodes with repmgr

Installing and configuring walctl

Cloning a database with walctl

Managing WAL files with walctl

Installing and configuring WAL-E

Managing WAL files with WAL-E

Simple Stack

Introduction

Preparing systems for the stack

Installing and configuring etcd

Installing and configuring Patroni

Installing and configuring HAProxy

Performing a managed failover

Using an outage to test availability

Adding a node back into the cluster

Adding additional nodes to the mix

Replacing etcd with ZooKeeper

Replacing etcd with Consul

Upgrading while staying online

Advanced Stack

Introduction

Preparing systems for the stack

Getting started with the Linux Volume Manager

Adding block-level replication

Incorporating the second LVM layer

Verifying a DRBD filesystem

Correcting a DRBD split brain

Formatting an XFS filesystem

Tweaking XFS performance

Maintaining an XFS filesystem

Using LVM snapshots

Switching live stack systems

Detaching a problematic node

Cluster Control

Introduction

Installing the necessary components

Configuring Corosync

Preparing startup services

Starting with base options

Adding DRBD to cluster management

Adding LVM to cluster management

Adding XFS to cluster management

Adding PostgreSQL to cluster management

Adding a virtual IP to hide the cluster

Adding an e-mail alert

Grouping associated resources

Combining and ordering related actions

Performing a managed resource migration

Using an outage to test migration

Data Distribution

Introduction

Identifying horizontal candidates

Setting up a foreign PostgreSQL server

Mapping a remote user

Creating a foreign table

Using a foreign table in a query

Optimizing foreign table access

Transforming foreign tables into local tables

Creating a scalable nextval replacement

Building a sharding API

Talking to the right shard

Moving a shard to another server

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Protecting your eggs

Did we suggest that having several servers was serious? We lied. The place where our servers live, the data center, also has several redundancies in place. Extra network lines, separate power sources, multiple generators, air conditioning and ventilation, everything a server can require.

Yet, some have joked that a common backhoe is the natural enemy of the Internet. There is more truth to that statement than its apparent lack of gravitas might suggest. Data centers are geographically insecure. Inclement weather, natural disasters, disrupted backbones, power outages, and of course, accidentally damaged trunk lines (from an errant backhoe?), and simple human error can all remove a data center from the grid. When a data center vanishes from the Internet, our servers become collateral damage.

However, we've done everything right! We have duplicates of everything, multiple parts, cables, even whole servers. What can we possibly do about the data center?

Well, it's complicated...

Getting ready

For this section, we will need a list of every database server in our proposed architecture, and the desired role for each.

How to do it...

This won't be a very long list. In any case, follow these steps:

For every critical OLTP operating pair, allocate at least one standby.
For every two online standby replicas, consider at least one standby.
For every other database instance, allocate one standby.

How it works...

This type of scenario is known as Disaster Recovery. In order to truly diffuse a data center outage, we need backups of every major database server, and even minor servers. The reasoning is simple: we don't know how long we have to operate at reduced capacity. At that point, even non-critical reporting services still need analogs, otherwise business decisions that depend on activity analysis may not be possible.

We only really need half the amount of database servers, as most disaster recovery scenarios are severe enough for raised alertness, reduced refresh times, manually extended queue timeouts, and more. Not only is this less expensive than having a copy of every server as the primary data center, but it also encourages closer monitoring until it can be restored. Larger companies can opt for complete parity between data centers, but this is not a requirement.

As DBAs, our scenario often resembles this:

Notice that we didn't make any reservations for QA or development database servers. In the case of a disaster, the primary concern is ensuring the continued availability of the application platform. Further development or testing is likely on hold for the duration of the outage in any case.

There's more...

We cannot stress the importance of this section strongly enough. Some may consider an entire extra data center as optional due to the cost. It is not. Others may think a total of three servers for every primary system is too much maintenance overhead. Again, it is not. The price of a few servers must be weighed against the future of the company itself; it is the cost of admission into the world of high availability.

By the time we begin utilizing failover nodes, or any replicas in a separate data center, the damage has already been done. In the absence of these resources, a database crash can result in hours or even days of unavailability depending on the size of our database, exponentially compounding the effects of the original problem.

With this in mind, all critical production systems the author designs always have a minimum of four nodes: two mirrored production systems, and two mirrored disaster recovery analogs. This ensures even the disaster recovery system is online with one node while the other node is experiencing maintenance. Outages are unexpected, and we must always be prepared for them.

PostgreSQL High Availability Cookbook - Second Edition

By : Shaun Thomas

PostgreSQL High Availability Cookbook - Second Edition

By: Shaun Thomas

Overview of this book

Related Content you might be interested in

Current Title:

PostgreSQL High Availability Cookbook - Second Edition

PostgreSQL 11 Administration Cookbook

PostgreSQL 13 Cookbook

PostgreSQL 14 Administration Cookbook

Protecting your eggs

Getting ready

How to do it...

How it works...

There's more...