Delayed Replication in MySQL 5.6 Development Release

The new Development Release for MySQL 5.6 contains a great feature that our users have been asking for for a while (work log 344 first raised in 2010!) – delayed replication.

Stop mistake being propagated

The concept (and as you’ll see the execution) is extremely simple. If a user makes a mistake on the master – such as dropping some critical tables – then we want to give them the opportunity to recover the situation by using the data held on one of the slaves. The problem is that the slave is busily trying to keep up with the master and in all likelihood will have dropped these tables before the user has time to pull the plug on the replication stream. What this feature does is give the DBA the option to introduce a configurable delay into the replication process so that they have time to cut it off before the mistake is propagated.

This blog explains how this works, how to set that up and then how to bring the slave up to date (to the point in time just before the mistake was made on the master).

MySQL Replication Implementation

To understand how this is implemented, it helps to have a little bit of background on how MySQL replication is implemented. When a change is made on the master, it is applied to the master’s local disk copy and then written to the binary log. The change is then asynchronously (but normally immediately) copied from the master’s binary log to the relay log on the slave; from there an SQL thread on the slave will read the change from the relay log and apply it to the slave’s copy of the data.

This feature works by allowing the user to configure a delay between when the change is applied on the master and when that change is taken from the relay log and applied to the slave. Note that if the master fails during this delay period then the change is not lost as it is has already been safely recorded in the slave’s relay log.

Immediate & Delayed Replication

As the delay is implemented on the slave, you are free to use ‘real-time’ replication to one slave (to allow the fastest possible failover if the master fails) and delayed replication to a second slave to guard against user error. This is the setup that this post steps through.

For simplicity, all three MySQL Servers will be run on a single host but each uses a different port number as shown in the diagram. “slave” will apply changes as quickly as it can while “slave2” will introduce a delay when applying changes from its relay log.

Setting up the first slave is very standard:

master> CREATE USER repl_user@localhost;
master> GRANT REPLICATION SLAVE ON *.* TO 
  repl_user@localhost IDENTIFIED BY 'pw';
slave> CHANGE MASTER TO
 -> MASTER_HOST = 'localhost',
 -> MASTER_PORT = 3306,
 -> MASTER_USER = 'repl_user',
 -> MASTER_PASSWORD = 'pw';
slave> start slave;
When setting up the delayed slave, one extra parameter is included in the CHANGE MASTER command:
slave2> CHANGE MASTER TO
 -> MASTER_HOST = 'localhost',
 -> MASTER_PORT = 3306,
 -> MASTER_USER = 'repl_user',
 -> MASTER_PASSWORD = 'pw',
 -> MASTER_DELAY = 20;
slave2> START SLAVE;
The MASTER_DELAY parameter indicates the delay in seconds (here I’ve used 20 seconds – in a production environment you’d probably want to give your self much longer.
Prior to this, a simple table had already been created:
master> CREATE DATABASE clusterdb;USE clusterdb;
master> CREATE TABLE towns (Town VARCHAR(20));
Next we check that changes are immediately applied on slave while delayed on slave2:
master> REPLACE INTO towns VALUES ("Maidenhead"),("Bray");
slave> SELECT * FROM towns;
 +------------+
 | Town       |
 +------------+
 | Maidenhead |
 | Bray       |
 +------------+
slave2> SELECT * FROM towns;
 Empty set (0.00 sec)

slave2> SELECT * FROM towns;
 +------------+
 | Town       |
 +------------+
 | Maidenhead |
 | Bray       |
 +------------+
The next step is to simulate a user error; I add some more data to the table and then drop the table. Following on from that I stop the replication on slave2 before the mistake is applied:
master> REPLACE INTO towns VALUES ("Cookham"),("Marlow");
master> DROP TABLE towns;
slave> SELECT * FROM tables;
     ERROR 1146 (42S02): Table 'clusterdb.tables' doesn't exist
slave2> STOP SLAVE;
slave2> SELECT * FROM towns;
 +------------+
 | Town       |
 +------------+
 | Maidenhead |
 | Bray       |
 +------------+

Bring the Slave up to the Desired Point in Time

This is a good start, while slave has dropped the table, it still exists on slave2. Unfortunately, slave2 is missing the additions to the table that were made just before the mistake was made. The next step is to bring slave 2 almost up to date – stopping just before the table was dropped. To do this we need to find the position within the master’s binary log just before the table was dropped – this can be done using the SHOW BINLOG EVENTS command on the master. Once we have that position (file-name + position) we can tell slave 2 to catch up just to that point using START SLAVE UNTIL . Once that has been done, I check that the extra 2 inserts have been applied to slave2:

master> SHOW BINLOG EVENTSG
....
*************************** 10. row ***************************
 Log_name: ws2-bin.000001
 Pos: 842
 Event_type: Query
 Server_id: 1
 End_log_pos: 957
 Info: use `clusterdb`; REPLACE INTO towns VALUES ("Cookham"),("Marlow")
 *************************** 11. row ***************************
 Log_name: ws2-bin.000001
 Pos: 957
 Event_type: Xid
 Server_id: 1
 End_log_pos: 984
 Info: COMMIT /* xid=32 */
 *************************** 12. row ***************************
 Log_name: ws2-bin.000001
 Pos: 984
 Event_type: Query
 Server_id: 1
 End_log_pos: 1096
 Info: use `clusterdb`; DROP TABLE `towns` /* generated by server */
slave2> START SLAVE UNTIL
 -> MASTER_LOG_FILE='ws2-bin.000001',
 -> MASTER_LOG_POS=984;
slave2> SELECT * FROM towns;
 +------------+
 | Town       |
 +------------+
 | Maidenhead |
 | Bray       |
 | Cookham    |
 | Marlow     |
 +------------+

Success! Now slave2 contains exactly the data we need. After this it’s up to you what to do next; typically this could involve promoting slave2 to be the new master.

If you want to try this out for yourselves then you can download the MySQL 5.6 Milestone Development Release from dev.mysql.com (select the Development Maintenance Release sub-tab to get MySQL 5.6).





4 comments

  1. This is great. It is currently available for all versions through Maatkit, but that is not as simple to use and inherently less reliable.

  2. Ricardo Vilhena says:

    Hello Andrew.

    First of all I apologise for making questions in you won blog. But I’m really stuck here with a problem which I think you can help me!

    I’ve installed mysql cluster and it’s all running smoothly (1 manage server and 2 data nodes).

    I’m trying to intall mysql-php in the data nodes so that I can user phpmyadmin but when I run yum install php-mysql.x86_64 I got the conflict error: MySQL-Cluster-gpl-server conflicts with mysql.

    Are the not compatible? Because I reallu need the two of them.

    Thanks.

    Regards,
    Vilhena

  3. Sanjeev says:

    Hi

    Right now i am using 5.5 version of MySQL. Currently this MASTER_DELAY option is available with 5.6 …! .

    So is there any possibility to make use of MASTER_DELAY command in version 5.5 ???

    Please suggest me…

    • andrew says:

      In general, we don’t retrofit new features to GA releases (users don’t want to have to retest their applications when there are new maintenance releases) and there are no plans to back-port delayed replication,

      Regards, Andrew.

Leave a Reply to Baron Schwartz Cancel reply