Book Image

PostgreSQL High Availability Cookbook - Second Edition

By : Shaun Thomas
Book Image

PostgreSQL High Availability Cookbook - Second Edition

By: Shaun Thomas

Overview of this book

Databases are nothing without the data they store. In the event of a failure - catastrophic or otherwise - immediate recovery is essential. By carefully combining multiple servers, it’s even possible to hide the fact a failure occurred at all. From hardware selection to software stacks and horizontal scalability, this book will help you build a versatile PostgreSQL cluster that will survive crashes, resist data corruption, and grow smoothly with customer demand. It all begins with hardware selection for the skeleton of an efficient PostgreSQL database cluster. Then it’s on to preventing downtime as well as troubleshooting some real life problems that administrators commonly face. Next, we add database monitoring to the stack, using collectd, Nagios, and Graphite. And no stack is complete without replication using multiple internal and external tools, including the newly released pglogical extension. Pacemaker or Raft consensus tools are the final piece to grant the cluster the ability to heal itself. We even round off by tackling the complex problem of data scalability. This book exploits many new features introduced in PostgreSQL 9.6 to make the database more efficient and adaptive, and most importantly, keep it running.
Table of Contents (18 chapters)
Title Page
About the Author
About the Reviewer
Customer Feedback

Determining acceptable losses

We know that the PostgreSQL database will be offline at some point in the future. Maybe we need an upgrade to remove a critical security vulnerability or address a potential data corruption issue. Perhaps a RAM module is producing errors and needs immediate replacement. Maybe the primary data center was struck by lightning.

No matter the reason, we need to make decisions quickly. A helpful way is to ensure that the decision-making process is basing the answers on what the user expects for various levels of liability and on the context of the user. The QA department will not require the same response level as 10,000 shoppers who can't make a holiday purchase during a critical sale.

System outage and response escalation expectations are generally codified in a Service Level Agreement (SLA). How long should the maintenance last? How often should planned outages occur? When should users be informed and to what extent? Who is included in the set of potential database...