Replication is an interesting feature of MySQL that can be used for a variety of purposes. It can help to balance server load across multiple machines, ease backups, provide a workaround for the lack of fulltext search capabilities in InnoDB, and much more.

The basic idea behind replication is to reflect the contents of one database server (this can include all databases, only some of them, or even just a few tables) to more than one instance. Usually, those instances will be running on separate machines, even though this is not technically necessary.

Traditionally, MySQL replication is based on the surprisingly simple idea of repeating the execution of all statements issued that can modify data—not SELECT—against a single master machine on other machines as well. Provided all secondary slave machines had identical data contents when the replication process began, they should automatically remain in sync. This is called Statement Based Replication (SBR).

With MySQL 5.1, Row Based Replication (RBR) was added as an alternative method for replication, targeting some of the deficiencies SBR brings with it. While at first glance it may seem superior (and more reliable), it is not a silver bullet—the pain points of RBR are simply different from those of SBR.

Even though there are certain use cases for RBR, all recipes in this chapter will be using Statement Based Replication.

While MySQL makes replication generally easy to use, it is still important to understand what happens internally to be able to know the limitations and consequences of the actions and decisions you will have to make. We assume you already have a basic understanding of replication in general, but we will still go into a few important details.

Statement Based Replication

SBR is based on a simple but effective principle: if two or more machines have the same set of data to begin with, they will remain identical if all of them execute the exact same SQL statements in the same order.

Executing all statements manually on multiple machines would be extremely tedious and impractical. SBR automates this process. In simple terms, it takes care of sending all the SQL statements that change data on one server (the master) to any number of additional instances (the slaves) over the network.

The slaves receiving this stream of modification statements execute them automatically, thereby effectively reproducing the changes the master machine made to its data originally. That way they will keep their local data files in sync with the master's.

One thing worth noting here is that the network connection between the master and its slave(s) need not be permanent. In case the link between a slave and its master fails, the slave will remember up to which point it had read the data last time and will continue from there once the network becomes available again.

In order to minimize the dependency on the network link, the slaves will retrieve the binary logs (binlogs) from the master as quickly as they can, storing them on their local disk in files called relay logs. This way, the connection, which might be some sort of dial-up link, can be terminated much sooner while executing the statements from the local relay-log asynchronously. The relay log is just a copy of the master's binlog.

The following image shows the overall architecture:

Filtering

In the preceding image, you can see that each slave may have its individual configuration on whether it executes all the statements coming in from the master, or just a selection of those. This can be helpful when you have some slaves dedicated to special tasks, where they might not need all the information from the master.

All of the binary logs have to be sent to each slave, even though it might then decide to throw away most of them. Depending on the size of the binlogs, the number of slaves and the bandwidth of the connections in between, this can be a heavy burden on the network, especially if you are replicating via wide area networks.

Even though the general idea of transferring SQL statements over the wire is rather simple, there are lots of things that can go wrong, especially because MySQL offers some configuration options that are quite counter-intuitive and can lead to hard-to-find problems.

For us, this has become a best practice:

Note

"Only use qualified statements and replicate-*-table configuration options for intuitively predictable replication!"

What this means is that the only filtering rules that produce intuitive results are those based on the replicate-do-table and replicate-ignore-table configuration options. This includes those variants with wildcards, but specifically excludes the all-database options like replicate-do-db and replicate-ignore-db. These directives are applied on the slave side on all incoming relay logs.

The master-side binlog-do-* and binlog-ignore-* configuration directives influence which statements are sent to the binlog and which are not. We strongly recommend against using them, because apart from hard-to-predict results they will make the binlogs undesirable for server backup and restore. They are often of limited use anyway as they do not allow individual configurations per slave but apply to all of them.

For these reasons you will not find any use of these options in this book.

MySQL Admin Cookbook LITE: Replication and Indexing

MySQL Admin Cookbook LITE: Replication and Indexing

Overview of this book

Related Content you might be interested in

Current Title:

MySQL Admin Cookbook LITE: Replication and Indexing

Introduction

Statement Based Replication

Filtering

Note