Book Image

MySQL Admin Cookbook LITE: Replication and Indexing

Book Image

MySQL Admin Cookbook LITE: Replication and Indexing

Overview of this book

Table of Contents (3 chapters)

Estimating network and slave I/O load

Especially when using replication over a wide-area network connection with limited bandwidth, it is interesting to be able to predict the amount of data that needs to be transported between master and slaves.

While MySQL does not use the most efficient strategy to deliver data, it is at least relatively easy to calculate the requirements in advance.

This is less of a step-by-step recipe than an annotated walkthrough of the basic formula that can be used to estimate the traffic you will have to be prepared for.

Getting ready

In order to follow along, you must have some key data points available because otherwise there is not much to calculate. You will need:

  • The number of slaves (to be) connected to the master.
  • An idea about the average amount of binlogs written when using the master under regular load. Knowing about peak times can be interesting as well.
  • The bandwidth of the connection between master and slaves. This includes the speed of the network interfaces on the master and, in general, the whole route between them (possibly including Internet connections).

We assume that there are no other network-intensive applications running on the master or slaves, so that practically all the speed your network card can provide is usable for MySQL.

In this example, we will keep matters simple, assuming the following:

Data point

Value

Master's Connectivity

Gigabit LAN interface (approx. 100MB/s)

Slaves' Connectivity

2MBit/s DSL line, of which 1MBit/s can be assumed available for MySQL. 1MBit/s comes down to approximately 100kb/s.

Average amount of binlogs created on master

175MB per hour, approx. 50kb/s.

Number of Slaves

5

Speed of the slowest link in the connection between Master and Slaves.

Master is connected to the Internet via a 10MBit/s connection, approx. 1MB/s.

How to do it...

  1. Check the master's network speed: Multiply the number of slaves with the average amount of binlogs: 5x175MB/hour = 875MB/hour or about 250kb/second. The gigabit connection can handle this easily.
  2. Check individual slaves' network speed: The 1MBit portion of the DSL line is sufficient for an average amount of data of 50kb/second. Often binlog production is not linear over time there might be peaks, but there is still a reserve.
  3. Check if the slowest part of the route between master and slaves can handle the load: 250kb/second should be no problem for the 10MBit/second Internet connection.
  4. Disk I/O load on each slave, caused by the replication, is the amount of relay logs being written. This is equivalent to the amount of binlogs the master produces. Provided the slave's I/O is not already saturated by other things, an additional 175MB per hour should not pose a problem either.

How it works...

Basically, replication simply needs sufficient resources to copy the master's binlogs to the slaves. This really all there is to it. Depending on the network route between them this can be easily done (say most LANs), or can be tricky (as in cases with slow Internet connections).

In this example, we see there should be no problem on any part of the system, as there is still room for a higher load on each resource. The most limiting factor in this scenario seems to be the master's outgoing Internet connection. If you add more slaves to the scenario, each new one will add another 50KB per second of outgoing bandwidth. Assuming replication can use the full 1MB/s outgoing speed, which is not very realistic, that part of the route could theoretically service 20 slaves at most. In reality, it will be more like 10 to 15.

There's more...

There are two more general considerations you might want to think about when planning a replication setup.

Handling intermittent connectivity between master and slave

If the connection between master and slaves is only available for a limited period of time, the slaves will start to lag behind while disconnected. The slaves will download new data as quickly as possible when the connection is resumed and store it locally in the relay logs, asynchronously executing the statements. So expect higher network load during these times.

You will also want to take that into account when there are multiple slaves trying to catch up at the same time. Under such circumstances, the route between master and slaves might become saturated more quickly.

Enabling compression with the slave_compressed_protocol option

Particularly useful for low bandwidth connections between master and slaves is the compression feature for replication traffic. Provided it is switched on on both master and slave machines, it can significantly reduce the amount of data that is actually transferred over the network at the cost of increased CPU loads. The master will then send out the binlog information in a compressed format.

In a simple comparison, measuring the network traffic while creating and loading the sakila sample database, 3180kb of binlogs were created. However, with the compressed protocol switched on, only about 700KB of data per slave were sent over the network.

To enable compression, add the following line to the [mysqld] section in the configuration files on both master and slave:

slave_compressed_protocol=1

Then restart the servers to enable the feature. Verify whether it was switched on successfully by issuing a SHOW VARIABLES LIKE 'slave_compressed%'; command on both master and slaves.

You can achieve a similar effect with SSH compression. As we generally do not recommend replicating over the Internet without encryption, that option might even be more appealing in such scenarios as it does not require any configuration changes to MySQL.

Note

Naturally, the level of compression heavily depends on the data you are handling. If you store JPEG images in BLOB fields, for example, those cannot be compressed much more than they already are!