Book Image

Instant PostgreSQL Backup and Restore How-to

Book Image

Instant PostgreSQL Backup and Restore How-to

Overview of this book

Backing up a database is something that all system admins must eventually face. Larger warehouses and critical enterprise data requires advanced techniques and tools to handle this complex data. PostgreSQL provides you with the basic tools for accomplishing this complex task. "Instant PostgreSQL Backup and Restore How-to" is a practical series of proven recipes showing you how to preserve critical business data, and also teach you some advanced methods for restoring this data. It is a perfect manual for managing your critical PostgreSQL data. Instant PostgreSQL Backup and Restore How-to is your practical guide to the unique features that PostgreSQL offers to create quick backups and efficient restores. Starting by backing up a simplistic database and learning how-to restore it to working order, we then move on to more complex backup techniques including creating and restoring binary backups. Covering advanced topics such as Warm and Hot standby restore and Streaming replication, Instant PostgreSQL Backup and Restore How-to gives you the power to make complete backups to guarantee you will always be able to restore your database to full working order!
Table of Contents (7 chapters)

Obtaining a binary backup (Simple)


Another backup method available to PostgreSQL is a base backup, which consists of the actual data files themselves. These kinds of backups do not need to be restored, only uncompressed or copied. Using them can be more complicated, but they can be ready much faster depending on the database size. The developers have kindly provided pg_basebackup as a simple starting point.

Getting ready

Please refer to the Getting a basic export (Simple) recipe on preparing a sample database.

Next we need to modify the postgresql.conf file for our database to run in the proper mode for this type of backup. Change the following configuration variables:

wal_level = archive
max_wal_senders = 5

Then we must allow a super user to connect to the replication database, which is used by pg_basebackup. We do that by adding the following line to pg_hba.conf:

local replication postgres peer

Finally, restart the database instance to commit the changes.

How to do it...

Though it is only one command, pg_basebackup requires at least one switch to obtain a binary backup, as shown in the following step:

  1. Execute the following command to create the backup in a new directory named db_backup:

    $> pg_basebackup -D db_backup -x
    

How it works...

For PostgreSQL, WAL stands for Write Ahead Log. By changing wal_level to archive, those logs are written in a format compatible with pg_basebackup and other replication-based tools.

By increasing max_wal_senders from the default of zero, the database will allow tools to connect and request data files. In this case, up to five streams can request data files simultaneously. This maximum should be sufficient for all but the most advanced systems.

The pg_hba.conf file is essentially a connection access control list (ACL). Since pg_basebackup uses the replication protocol to obtain data files, we need to allow local connections to request replication.

Next, we send the backup itself to a directory (-D) named db_backup. This directory will effectively contain a complete copy of the binary files that make up the database.

Finally, we added the -x flag to include transaction logs (xlogs), which the database will require to start, if we want to use this backup. When we get into more complex scenarios, we will exclude this option, but for now, it greatly simplifies the process.

There's more...

The pg_basebackup tool is actually fairly complicated. There is a lot more involved under the hood.

Viewing backup progress

For manually invoked backups, we may want to know how long the process might take, and its current status. Luckily, pg_basebackup has a progress indicator, which does that by using the following command:

$> pg_basebackup -P -D db_backup

Like many of the other switches, -P can be combined with tape archive format, standalone backups, database clones, and so on. This is clearly not necessary for automated backup routines, but could be useful for one-off backups monitored by an administrator.

Compressed tape archive backups

Many binary backup files come in the TAR (Tape Archive) format, which we can activate using the -f flag and setting it to t for TAR. Several Unix backup tools can directly process this type of backup, and most administrators are familiar with it.

If we want a compressed output, we can set the -z flag, especially in the case of large databases. For our sample database, we should see almost a 20x compression ratio. Try the following command:

$> pg_basebackup -Ft -z -D db_backup

The backup file itself will be named base.tar.gz within the db_backup directory, reflecting its status as a compressed tape archive. In case the database contains extra tablespaces, each becomes a separate compressed archive. Each file can be extracted to a separate location, such as a different set of disks, for very complicated database instances.

For the sake of this example, we ignored the possible presence of extra tablespaces than the pg_default default included in every installation. User-created tablespaces will greatly complicate your backup process.

Making the backup standalone

By specifying -x, we tell the database that we want a "complete" backup. This means we could extract or copy the backup anywhere and start it as a fully qualified database. As we mentioned before, the flag means that you want to include transaction logs, which is how the database recovers from crashes, checks integrity, and performs other important tasks. The following is the command again, for reference:

$> pg_basebackup -x -D db_backup

When combined with the TAR output format and compression, standalone binary backups are perfect for archiving to tape for later retrieval, as each backup is compressed and self-contained. By default, pg_basebackup does not include transaction logs, because many (possibly most) administrators back these up separately. These files have multiple uses, and putting them in the basic backup would duplicate efforts and make backups larger than necessary.

We include them at this point because it is still too early for such complicated scenarios. We will get there eventually, of course.

Database clones

Because pg_basebackup operates through PostgreSQL's replication protocol, it can execute remotely. For instance, if the database was on a server named Production, and we wanted a copy on a server named Recovery, we could execute the following command from Recovery:

$> pg_basebackup -h Production -x -D /full/db/path

For this to work, we would also need this line in pg_hba.conf for Recovery:

host replication postgres Recovery trust

Though we set the authentication method to trust, this is not recommended for a production server installation. However, it is sufficient to allow Recovery to copy all data from Production. With the -x flag, it also means that the database can be started and kept online in case of emergency. It is a backup and a running server.

Parallel compression

Compression is very CPU intensive, but there are some utilities capable of threading the process. Tools such as pbzip2 or pigz can do the compression instead. Unfortunately, this only works in the case of a single tablespace (the default one; if you create more, this will not work). The following is the command for compression using pigz:

$> pg_basebackup -Ft -D - | pigz -j 4 > db_backup.tar.gz

It uses four threads of compression, and sets the backup directory to standard output (-) so that pigz can process the output itself.