If you need to reset one or more slaves regularly, say every morning before business hours begin, importing a SQL dump might take too much time, especially if the slaves are relatively low-end machines without a sophisticated I/O subsystem.

In this recipe, we will present a way to set up a MySQL slave with minimal I/O load on the hard drive and the network adapter of the slave. The example assumes a Linux-based slave machine; however, you should be able to apply this to Windows as well. but you will need to download some free tools most Linux distributions come with out of the box.

The general idea is to have a more powerful machine, which can be the master if resources allow, to prepare a complete set of data files for the slaves and later stream them directly to the slave's disk from a web server.

Getting ready

To try this out, you will need a master server with at least one slave. Additionally, a machine with a web server installed is required. Depending on your setup, the master server might be suitable for this task. In the example that follows, we will assume that the master server has a web server running.

How to do it...

Set up a fresh temporary MySQL daemon with a configuration similar to the slaves.
Dump the data from the master with the --master-data option and feed it into the temporary server.
Shut down the temporary server and compress its data.
Transfer the archive to the slaves and unpack.
Adapt the slaves' config files.
Run the slaves and let them connect and catch up with the master.

How it works...

This recipe is based on the fact that you can quite easily copy MySQL's data files (including InnoDB table space files) from one machine to another, as long as you copy them all. So, we first create a ready-to-go set of slave data files on a relatively powerful machine and transfer them to multiple slaves with weaker hardware. Usually, those files will be bigger than a simple SQL dump file that is usually used for slave setups. But no parsing and processing is required on the target system. This makes the whole thing mostly network and linear disk I/O bound.

The idea behind this concept is to relieve the individual slaves from importing SQL files themselves. As their hardware is rather slow and MySQL only supports single threaded slave SQL execution, this can be very time consuming. Instead, we use the master's better resources temporarily as a single power-slave and let it handle the process of importing. We then provide any number of identical slaves with its data files. This will reduce the burden of the other slaves to simply unpack some files.

While this does not really save anything in terms of bytes that need to be written to each slave's disk, the access pattern is much more sensible. The following table compares the disk transfers for a regular SQL import from local disk and the proposed alternative for a 60MB gzipped SQL file, which will lead to approximately 2GB of InnoDB table space files:

Regular SQL Import	Prepared Data File Deployment
Linear write 60MB download to local disk	Download 60MB, directly streamed to 2GB data files, written linearly
Linear write 2GB initial creation of InnoDB data files	n/a
Linear read 60MB SQL.gz, interleaved with random write 2GB to data files	n/a
4GB total read/written randomly	2GB linear write

Importing a SQL file from the local hard disk means there are continual seeks between the current position in the SQL text file and the server's data files. Moreover, as the database schema may define lots of indexes, there is even more random disk write activity when executing simple INSERT statements.

In contrast unpacking ready-made InnoDB table spaces (or MyISAM table files for that matter)is basically just linear writing.

Temporary daemon

The SQL dump needs to be executed at least once. So, we set up a temporary MySQL daemon with a stripped down configuration that is close to the actual slaves—meaning all the parameters that affect the storage files must match the slaves to create compatible data files.

Every time you want to prepare such a new slave installation image, the temporary daemon should be started with an empty data directory. While not strictly necessary, we prefer to delete the table space and transaction log files every time because it allows for better compression rates later.

The data files should be created close to the size that will be needed, maybe a little more to prevent the need for them to grow. Nevertheless, specify the last data file to be auto-extending. Otherwise the process of importing the SQL data may lead to filling the table space prematurely, especially when used in an automated process that can be difficult to handle.

Also, you should allow InnoDB to add larger chunks to the last data file if needed (default: 8 MB). Extending the files is associated with some overhead, but using bigger chunks reduces the impact on the I/O subsystem. You should be fine with 50 MB or 100 MB. The bigger this is, the less often InnoDB will have to extend the file. See the manual section on InnoDB configuration for more info.

Dumping master data

Once you have the temporary daemon running, use the mysqldump tool with the --master-data and --single-transaction options to create a dump of the database(s) you need to replicate. In order to save time and disk space, you may find it useful to pipe the output directly through the mysql command-line client and feed it into the target temporary server.

Shutting down and compressing

You can now shut down the temporary server. Compress the data directory. Depending on how you want to configure permissions, you may include or exclude the mysql schema. We usually have the temporary server set up with as low permissions as possible and do not move the mysql schema along.

For compression, you should not use the ZIP format. It contains a catalog of all files included at its very end; so piping it through a decompression program on the fly will not work. Instead, we use a gzipped tarball. This allows us to download and to pipe the data stream through gunzip before directing it to disk.

Transferring to the slave and uncompressing

On the slave we suggest curl as a download tool. It is important that the tool you choose be able to output the downloaded file directly to standard out. With curl that is quite simple—it is its default behavior. It also handles files larger than 2 GB, which some versions of wget have problems with. The command line should look similar to this:

curl http://the.server/mysql_data.tgz | tar -C /the/target/datadir -xzf -

curl will download the file and pipe it to tar to decompress into the target data directory.

Note

Do not miss the final - at the end of the command!

You will find that on a local area network, downloading and unpacking will be considerably faster than having MySQL to first create the empty data file and then import the SQL, for the reasons stated above.

Adjusting slave configuration

When the data files have reached their destination on the slave, you may need to adjust the slave settings. This especially depends on whether you copied fixed size data files (in which case you can prepare the config file in advance) or used the autoextend option on the last table space file. In that case, you could write a little script that takes a template my.cnf file with your basic settings and replaces some placeholders for the data file-related settings via sed. One of those is the size of the last InnoDB data file from the archive. It will become a fixed size file on the slave. Another file will then be added at the first slave start.

Connecting to the master

One last thing that needs to be done is to read the master's current binlog file name and position from the master.info file. This is required because once the slave server has been started you will need to provide correct credentials for the replication user. You must also explicitly tell the slave which master host to connect to. Unfortunately, when issuing a CHANGE MASTER TO command on the slave, which includes a master host name, all information about previous master binlogs the corresponding offset is discarded (see MySQL online manual, Chapter 12.6.2.1 CHANGE MASTER TO Syntax at http://dev.mysql.com/doc/refman/5.1/en/change-master-to.html).

Therefore, you will need to tell the slave again where to begin replication.

One possible solution is to read the contents of the master.info file that was brought along with the data files into a bash script array and inject the values into the statement:

arr = ( $(cat master.info) )
mysql -e "CHANGE MASTER TO master_host='the.master.server', master_user='replication_user', master_password='the_password',
master_log_file='${arr[2]}', master_log_pos=${arr[3]}"

The format of the master.info file is described in the MySQL manual.

Starting the slave

As soon as you issue a START SLAVE statement, the slave will connect to the master and begin to catch up with whatever has happened since the time when the dump was taken.

MySQL Admin Cookbook LITE: Replication and Indexing

MySQL Admin Cookbook LITE: Replication and Indexing

Overview of this book

Related Content you might be interested in

Current Title:

MySQL Admin Cookbook LITE: Replication and Indexing

Setting up slaves via network streaming

Getting ready

How to do it...

How it works...

Temporary daemon

Dumping master data

Shutting down and compressing

Transferring to the slave and uncompressing

Note

Adjusting slave configuration

Connecting to the master

Starting the slave