Book Image

Instant PostgreSQL Backup and Restore How-to

Book Image

Instant PostgreSQL Backup and Restore How-to

Overview of this book

Backing up a database is something that all system admins must eventually face. Larger warehouses and critical enterprise data requires advanced techniques and tools to handle this complex data. PostgreSQL provides you with the basic tools for accomplishing this complex task. "Instant PostgreSQL Backup and Restore How-to" is a practical series of proven recipes showing you how to preserve critical business data, and also teach you some advanced methods for restoring this data. It is a perfect manual for managing your critical PostgreSQL data. Instant PostgreSQL Backup and Restore How-to is your practical guide to the unique features that PostgreSQL offers to create quick backups and efficient restores. Starting by backing up a simplistic database and learning how-to restore it to working order, we then move on to more complex backup techniques including creating and restoring binary backups. Covering advanced topics such as Warm and Hot standby restore and Streaming replication, Instant PostgreSQL Backup and Restore How-to gives you the power to make complete backups to guarantee you will always be able to restore your database to full working order!
Table of Contents (7 chapters)

Synchronizing backup servers (Intermediate)


Binary backups with tar create an intermediate file for archival purposes. Instead, it might be better to back up directly to another server to have a full running backup. To do this, most administrators rely on rsync, which remotely synchronizes data between two servers. For low volume servers, this can be much faster than tar as well.

Getting ready

Please refer to the Getting a basic export (Simple) recipe on preparing a sample database.

Our server will be running in the /db directory on both systems. Make sure the path of the directory exists and is owned by the postgres user. On the remote server that will be receiving the backup, create a file named /etc/rsyncd.conf with the following content:

[db_sync]
    path = /db
    comment = DB Backup Server
    uid = postgres
    gid = postgres
    read only = false
    use chroot = true

There are other available options for securing the remote server, but for now, we will ignore those options for the purposes of demonstration.

How to do it...

Creating an rsync backup comes in three basic steps:

  1. Begin by putting the database in backup mode:

    $> psql -c "SELECT pg_start_backup('label');" postgres
    
  2. Next, use the rsync command to synchronize the contents of the two servers:

    $> rsync -a -v -z /db/ postgres@remote_server::db_sync
    
  3. Finally, end backup mode:

    $> psql -c "SELECT pg_stop_backup();" postgres
    

How it works...

In our initial steps, we configure the rsync daemon with a module to accept file transfers. It's configured to run as the postgres user by default since that is a very common setup. We also enable symbolic links because PostgreSQL uses them extensively in tablespaces. We do not want to diverge too far from our copied server, so all paths should remain the same if possible.

As with a tar backup, we tell PostgreSQL that we're starting a backup. This is not strictly necessary since we're making an exact copy of the database as it runs, but is a good practice.

Next, we use rsync to physically copy the data to the remote server. We add the -a flag to copy all data attributes such as file ownership and permissions. The -v flag is simply to increase verbosity so we can watch the copy progress. Again, such a flag is not necessary but useful for illustrative purposes.

Tip

To shorten the duration of the backup and lower the amount of transaction log files, you will need to track and execute the same command before starting backup mode. The rsync command will then only copy changed files since the first synchronization.

Finally, we stop the backup. If we never entered PostgreSQL's backup mode, we don't need to end it either. In reality, the entire backup can be done with the rsync command alone.

Tip

The rsync command does not need to work on a push model. Instead, the rsyncd.conf file could reside on the server itself, and backup servers could fetch files from the master copy. This may even be the suggested method if there are several backup servers that need to be updated asynchronously.

There's more...

If we check the manual page for rsync, we can see that there are several other highly useful switches that can modify or control the data stream. There are a lot of useful switches to discuss.

Speeding up rsync

The rsync command operates in real time, but before it starts, it makes a list of all files that need to be copied. Since the database is running while we are copying its contents, new files may appear during the backup that will not get copied, and old files may vanish. The best way to use rsync is to actually run it multiple times with an extra parameter as shown in the following commands:

$> rsync -a -v -z /db/ postgres@remote_server::db_sync
$> rsync -a -v -z --delete-after /db/ postgres@remote_server::db_sync

The first command may not complete for several hours depending on the database size, so the second should be executed once or twice to drastically shorten the number of changed files we need to copy.

Afterwards, it's good practice to shut down the database, and do one final sync. This guarantees that no files have changed and the backup is fully valid. Because we ran earlier synchronizations, downtime should be very short.

Whole file sync

PostgreSQL keeps most table data in 1 GB chunks. To avoid recopying data, rsync will normally compute differences between files on each server. This can actually take a very long time for such large files. If enough bandwidth is available, it may actually be easier to simply transfer the whole file than the differing parts using the following command:

$> rsync -W -a -v -z /db/ postgres@remote_server::db_sync

The -W flag does exactly this.

Exclude transaction logs

If we set up a backup copy for replication, we do not need the pg_xlog directory, so we can exclude that as well.

$> rsync -a -v -z --exclude=pg_xlog /db/ \
  postgres@remote_server::db_sync