As MySQL cannot detect if two servers are in sync (that is they contain the same records and tables), one would often like to verify that master and slave are still working on identical data sets to be sure no corruption has occurred yet.
For this purpose, the excellent Maatkit suite of programs (see http://www.maatkit.org) contains a handy tool called mk-table-checksum.
It automatically calculates checksums of tables on one or more servers, which can then be compared. Should the checksums differ, then the table in question is not identical on the machines involved in the check.
The servers involved need not necessarily be a replication master and slaves, but can be any set of servers you wish to compare. mk-table-checksum
has an additional alternative means of checking the special case in replication environments to see if a master and its slaves are in sync. See the There's more... section at the end of this recipe for more details on this feature.
Maatkit is written in Perl. While on most Unix-like systems this scripting language is already installed by default or can be easily downloaded and set up, Windows users will not be so lucky in general. If you are stuck with Windows, you might want to take a look at ActivePerl, a mature Perl implementation for Windows.
Moreover, you are definitely going to need the Maatkit mk-table-checksum
tool. You can get it directly from http://www.maatkit.org. Also, download the mk-checksum-filter
companion tool and put it in the same directory as mk-table-checksum
.
In this example, we will compare two MySQL servers that differ in the sakila
database's country
table located on machines called serverA and serverB.
You will need to have user accounts for both machines that have permission to connect and execute statements remotely.
The command lines in this recipe might change with newer versions of Maatkit, as it is under active development. Double-check with the online manual that the instructions printed here are still current before trying them out.
- On a command shell prompt, enter the following line, assuming
mk-table-checksum
is in the current directory and executable:$ ./mk-table-checksum h=serverA,u=userA,p=passwordA h=serverB,u=userB,p=passwordB | ./mk-checksum-filter
- Check the output of this command (formatted and abbreviated for printing):
Database |
Table |
Chunk |
Host |
Engine |
Count |
Checksum |
---|---|---|---|---|---|---|
sakila |
country |
0 |
serverA |
InnoDB |
NULL |
2771817858 |
sakila |
country |
0 |
serverB |
InnoDB |
NULL |
3823671353 |
Notice the last column: The checksums do not match—the tables are not identical.
mk-table-checksum
connects to all servers listed on the command line and calculates checksums for all tables. Identical table contents result in identical checksums. So if the checksums from two servers do not match for any given table, there must be a difference in their contents. The mk-checksum-filter
tool removes all lines from the output that do not indicate a checksum mismatch.
Note
It is important to know that the checksums are different if you employ different versions of MySQL across servers. In this case, a different checksum might just be the result of the different versions!
mk-table-checksum
offers several algorithms for checksumming, each with different speeds and different levels of resilience against certain kinds of data differences that might cancel each other out, leading to identical checksums, but for different data. The Maatkit online manual contains detailed and current information on this topic.
Due to the asynchronous nature of MySQL replication, executing the checksum statements remotely from a single machine may not yield reliable results. This is because the master database might already contain modifications that have not been executed by each slave yet.
To compensate, mk-table-checksum
offers a special mode to check slaves and masters. Instead of executing the calculations remotely, the statements to do so are written to the master's binlog and then sent off to the slaves via the regular replication mechanism. This ensures that each slave will calculate the checksum at the correct time with respect to the transaction order. The results are then stored in a table on the slave that can be retrieved with a second command remotely later on. To use this feature, you need a user with sufficient privileges to create a table for this purpose on the slaves.
For more details, see the --replicate
and --create-replicate-table
options in the Maatkit online manual.