Book Image

Instant PostgreSQL Backup and Restore How-to

Book Image

Instant PostgreSQL Backup and Restore How-to

Overview of this book

Backing up a database is something that all system admins must eventually face. Larger warehouses and critical enterprise data requires advanced techniques and tools to handle this complex data. PostgreSQL provides you with the basic tools for accomplishing this complex task. "Instant PostgreSQL Backup and Restore How-to" is a practical series of proven recipes showing you how to preserve critical business data, and also teach you some advanced methods for restoring this data. It is a perfect manual for managing your critical PostgreSQL data. Instant PostgreSQL Backup and Restore How-to is your practical guide to the unique features that PostgreSQL offers to create quick backups and efficient restores. Starting by backing up a simplistic database and learning how-to restore it to working order, we then move on to more complex backup techniques including creating and restoring binary backups. Covering advanced topics such as Warm and Hot standby restore and Streaming replication, Instant PostgreSQL Backup and Restore How-to gives you the power to make complete backups to guarantee you will always be able to restore your database to full working order!
Table of Contents (7 chapters)

Point in time recovery (Intermediate)


PostgreSQL also has the ability to restore the database to any point in history following the previous backup. This is called point in time recovery (PITR). It does this by keeping files called transaction logs. So long as we store these transaction logs, we can use them to restore to any date they contain.

Getting ready

Please refer to the Getting a basic export (Simple) recipe to bootstrap our database. Before we start the database to create the sample database, we need to change a couple of settings in postgresql.conf. For this, we will need a path in /archive to store transaction logs, which can be used by the following configuration settings:

wal_level = archive
archive_mode = on
archive_command = 'test -f /archive/%f || cp %p /archive/%f'

Then start the database and use pgbench to initialize our sample data. Afterwards, we need a very simple backup to restore. Leave the database running for the following steps:

$> psql -c "SELECT pg_start_backup('label');" postgres
$> tar -C /db/pgdata --exclude=pg_xlog -cz \
           -f /db/backup/backup.tar.gz .
$> psql -c "SELECT pg_stop_backup();" postgres

After the backup, we want to generate a bit more activity that we can ignore. Make a note of the time, then run the following commands to erase the teller table so that we can restore to a point before that happened:

$> psql -c "drop table pgbench_tellers; " sample
$> psql -c "select pg_switch_xlog(); " sample

Once these files are safely stored, stop and erase our sample database:

$> pg_ctl -D /db/pgdata stop -m fast
$> rm -Rf /db/pgdata

How to do it...

At first, our restore works as before. Assume that the time of our backup was 5:15:26 P.M. on February 15, 2013, and we know the table was dropped at 5:30:00 P.M. Remember to use the actual backup and drop times if following along. The following are the steps:

  1. Create proper database folders, and extract the backup from the previously stated database using the following commands:

    $> mkdir -m 700 /db/pgdata
    $> mkdir -m 700 /db/pgdata/pg_xlog
    $> tar -C /db/pgdata -xzf /db/backup/backup.tar.gz
    
  2. Then create a file named recovery.conf in our /db/pgdata directory to contain the following:

    restore_command = 'cp /archive/%f "%p"'
    recovery_target_time = '2012-02-15 17:29:00'
  3. Then start the database as usual. It will recover for a few minutes until it reaches the indicated time, and become available.

    $> pg_ctl -D /db/pgdata start
    

How it works...

Most of the changes necessary for point in time recovery to work are handled before we even start the database. In our preparation, we set the database wal_level to archive. This forces PostgreSQL to write extra information to transaction logs so this type of recovery works.

We also didn't stop the database during the backup process. This means we really should use the pg_start_backup and pg_stop_backup commands to ensure all of the necessary transaction logs are written to the /archive directory.

With a backup, we then connect to the database to drop the teller table to simulate an accident. By calling pg_switch_xlogs, this information is also recorded to the archives. This way, we can prove that the selected time was honored by our recovery.

Recovery itself is tied to recovery.conf, which tells the database to recover from our /archive directory until the indicated time. Note how this is probably the most straightforward part of the operation. Knowing where to find the archived transaction log files and putting the database in the proper write mode are the real keys to PITR.

In the example recovery_target_time, we restore to one minute before the table was dropped. This way, we can save the contents of the table and reimport them into our main database.

There's more...

With the ability to choose the time when recovery stops, there are other methods of stopping recovery.

Named restore points

If we are doing work known to possibly be dangerous, it is best to do it within a transaction. But larger sweeping changes may contain a lot of DDL or precede a large series of database loading jobs. At these times, it may be beneficial to name certain actions so we can recover specifically to a point before they executed. To do this, we first need to name the action with a built-in function.

$> psql -c "select pg_create_restore_point('dangerous');" postgres

Then after the commands are executed and we decide to restore to our named label, our recovery.conf changes slightly as shown in the following settings:

restore_command = 'cp /archive/%f "%p"'
recovery_target_name = 'dangerous'

With no other changes in the process, recovery will stop when it reaches the named point we created.

Timelines

At some point at the end of the recovery, the database will output the following line to the log file:

LOG:  selected new timeline ID: 2

What is a timeline? By restoring a database, we are effectively traveling in time, to the period when the backup was taken. This is especially true if we restore to a specific point in time or a named restore point. If this is done several different times, we may want to recover to a timeline somewhere in the middle of several recovery attempts. We can do that by selecting a timeline in the recovery.conf file.

restore_command = 'cp /archive/%f "%p"'
recovery_target_timeline = 4

This type of recovery will restore to the end of the selected timeline unless a time is specified. In the previous case, we did a lot of select statements and came to the conclusion that it was the correct timeline, and we want it to fully recover before using it as our main database.

It is important to remember that no recovery can reach a time before the backup was taken. In addition, all transaction logs after the backup are necessary to be fully restored. At the very least, we need the logs leading to our selected time, timeline, or recovery point. That is why we keep them in a remote server or tape for as long as possible. Enterprise systems sometimes keep both backups and transaction log files around for several years. Either way, it is good practice to keep both binary backup types for emergency use for several days.