Book Image

Instant PostgreSQL Backup and Restore How-to

Book Image

Instant PostgreSQL Backup and Restore How-to

Overview of this book

Backing up a database is something that all system admins must eventually face. Larger warehouses and critical enterprise data requires advanced techniques and tools to handle this complex data. PostgreSQL provides you with the basic tools for accomplishing this complex task. "Instant PostgreSQL Backup and Restore How-to" is a practical series of proven recipes showing you how to preserve critical business data, and also teach you some advanced methods for restoring this data. It is a perfect manual for managing your critical PostgreSQL data. Instant PostgreSQL Backup and Restore How-to is your practical guide to the unique features that PostgreSQL offers to create quick backups and efficient restores. Starting by backing up a simplistic database and learning how-to restore it to working order, we then move on to more complex backup techniques including creating and restoring binary backups. Covering advanced topics such as Warm and Hot standby restore and Streaming replication, Instant PostgreSQL Backup and Restore How-to gives you the power to make complete backups to guarantee you will always be able to restore your database to full working order!
Table of Contents (7 chapters)

Warm and hot standby restore (Intermediate)


Another option for restoring a PostgreSQL backup is to restore indefinitely. A backup restored this way would always be a working copy of the database, available with a single activation command (warm), or online for read-only queries (hot).

Getting ready

Please refer to the Getting a basic export (Simple) recipe to bootstrap our database. Before we start the server to create the sample database, we need to change a few settings in postgresql.conf as shown in the following code:

wal_level = hot_standby
archive_mode = on
archive_command = 'rsync -a %p postgres@backup::archive/%f'

Next, we will want to set up the backup server with an rsyncd.conf file:

[db_sync]
    path = /db/pgdata
    comment = DB Backup Server
    uid = postgres
    gid = postgres
    read only = false
    use chroot = true
 [archive]
    path = /db/archive
    comment = Archived Transaction Logs
    uid = postgres
    gid = postgres
    read only = false
    use chroot = true

Make sure /db/pgdata and /db/archive exist on both servers and the rsync daemon has been restarted. Then start the main database and use pgbench to initialize our sample data.

How to do it...

The rsync backup step that we will perform also acts as a restore. The data on the backup server will be almost ready to run. Execute the following commands from the main database server:

  1. Start by entering backup mode:

    $> psql -c "SELECT pg_start_backup('standby');" postgres
    
  2. Next, send the data files to the backup server:

    $> rsync -a --exclude=pg_xlog /db/pgdata/ \
     postgres@backup::db_sync
    
  3. Then close backup mode:

    $> psql -c "SELECT pg_stop_backup();" postgres
    

Once the data is copied, go to the backup server and finish setting up the database. Remember, this is on the backup server, not the primary server. The following are the steps to finish the setup:

  1. Create a pg_xlog directory so the database will start:

    $> mkdir /db/pgdata/pg_xlog
    
  2. Remove the backup label, since it is not necessary for recovery:

    $> rm /db/pgdata/backup_label
    
  3. The postmaster files are created during operation for bookkeeping, and can cause problems when restarting if certain flags are omitted. Remove them:

    $> rm /db/pgdata/postmaster.*
    
  4. Create a recovery.conf file on the backup server with the following contents:

    standby_mode = on
    trigger_file = '/tmp/promote_db'
    restore_command = 'cp -f /db/archive/%f "%p"'
  5. Then start the backup copy of the database:

    $> pg_ctl -D /db/pgdata start
    

How it works...

We are starting to combine many of the techniques we have learned so far. Primary among those is the ability to copy data from one server to another without the intermediate step of first transforming it to a single file.

We start the process by setting up the main server to produce transaction logs in hot_standby mode. Doing so is necessary if we want to utilize the database in read-only mode on the backup server. We also modify the archive command to transfer archived transaction logs directly to the backup server. This is just one method of doing so, though a fairly convenient one.

Having both of the rsync target paths set up in advance promotes these types of transfers. Data synchronization is necessary for starting the backup from scratch any time, and we need the archive log target so that the backup can stay in recovery mode.

The backup and restore process itself is something we covered before. Here, we simply copy all of the data from the main database except for the transaction logs. Those will be copied by the database using the recovery.conf file that controls the restore process. That recovery should begin as soon as the backup copy of the database is started.

With this setup, if we should try to connect to the backup database, it will respond with:

psql: FATAL:  the database system is starting up

That's okay for now, because we know the database is properly in recovery mode and keeping itself as a fresh backup of our main copy. This status is what's known as warm standby, and for a very long time, was the only way PostgreSQL could do standby operations.

There's more...

Simply having a standby database is a very powerful backup technique. But how do we actually utilize the backup copy?

Trigger a backup online

The easiest method for activating the backup database so it acts as the main database would, is to use a trigger file. This can actually be activated by any user who can access the backup server. In our recovery.conf, we defined a trigger file:

trigger_file = '/tmp/promote_db'

If this trigger file exists, PostgreSQL will delete it, stop processing transaction logs, and restart the database as a normal operating server. On secure dedicated backup servers, this is a valid and easy way for activating a backup server and making it a normal online database.

Backup database promotion

The second method utilizes another built-in capability of the venerable pg_ctl tool. To restart without a trigger file, we do something called a database promotion using the following command:

$> pg_ctl -D /db/pgdata promote

This has the same effect as the trigger file, but can only be used by the user who started or owns the PostgreSQL service. On less secure servers, it might be a good idea to use the previous command instead of the trigger_file line in the recovery.conf file.

Hot standby

Believe it or not, a standby server does not simply have to process transaction logs and wait to be promoted or triggered. In fact, it is now very common to run one or more backup servers for various purposes related to load balancing.

To make a backup database readable, simply modify its postgresql.conf file and enable hot_standby:

hot_standby = on

So long as the main database also has the wal_level set to hot_standby, and had that setting before the backup was taken, backup databases can also service read-only queries.

Asynchronous archival

In some ways, using rsync as the archive_command line is a bad practice. If there are multiple backup or standby servers, where should the transaction logs be sent? There's also a small risk that the target becomes unresponsive and files build up waiting for archival until they consume all available drive space.

Another way to ensure that the archive logs make it to the backup server is to utilize the Unix cron system. Let's make a cron entry on the main database server for the postgres user by calling crontab -e:

* * * * *  rsync -a /db/archive/ postgres@backup::archive

This will check every minute, and ensure that the /db/archive directories are the same on both servers. Of course, for this to work, we should revert to the old archive_command in postgresql.conf:

archive_command = 'test -f /db/archive/%f || cp %p /db/archive/%f'

With this in place, PostgreSQL will archive old transaction log files locally, and an asynchronous background process will ensure that these files reach the backup server eventually.

Tip

You can set up the synchronization job on the backup server instead, and reverse the direction of the rsync, so the backup server pulls the files from the primary database. By doing this, you can set up multiple backup servers for different purposes.

It is a good idea to also set up a cleanup process to prune very old transaction log files. Having an asynchronous transmission method is safer, but a little more difficult to maintain. We can use cron similarly to the rsync to do this:

0 * * * * find /db/archive -type f -mtime +7 -delete

This would delete any transaction log files older than one week, every hour.