Tag Archive for Replication

MySQL Cluster 7.4 is GA!

MySQL Cluster 7.4 GA banner

The General Availability of MySQL Cluster 7.4 has just been announced by Oracle.

The MySQL team at Oracle are excited to announce the General Availability of MySQL Cluster 7.4, ready for production workloads.

MySQL Cluster 7.4.4 can be downloaded from mysql.com and the release notes viewed in the MySQL docs.

Figure 1 provides a summary of the enhancements delivered in this release:

  • Performance
    • 200M NoSQL Reads/Sec
    • 2.5M SQL Ops/Sec
    • 50% Faster Reads
    • 40% Faster Mixed
  • Active-Active
    • Active-Active Geographic Redundancy
    • Conflict Detection/Resolution
  • Management
    • 5X Faster Maintenance Ops
    • Detailed Reporting
MySQL Cluster 7.4 content summary

Figure 1: MySQL Cluster 7.4 content summary

The sections that follow delve into some more details.

Performance

MySQL CLuster 7.4.1 Read-Write Performance

Figure 2: Read/Writes 40% Faster than MySQL Cluster 7.3

MySQL CLuster 7.4 Read Performance

Figure 3: Reads 50% Faster than MySQL Cluster 7.3

50% Faster Reads than MySQL Cluster 7.3

Being a scaled-out, in-memory, real-time database, MySQL Cluster performance has always been great but we continue to work on making it faster each release. In particular, we want to keep pace with the trend of having more and more cores rather than faster ones. 7.4 continues along the path of better exploiting multiple cores – as can be seen from these benchmark results:

  • Figure 3 shows that for reads, MySQL Cluster 7.4 is 50% faster than last release
  • Figure 2 shows that for mixed traffic, MySQL Cluster 7.4 is 50% faster than last release

Just make sure that you’re using the multi-threaded data node (ndbmtd rather than ndbd) and have configured how many threads it should use.

The tests were performed using a 48 core/96 thread machine (also demonstrating how well MySQL Cluster can now scale up with large numbers of cores).

So, the sysbench benchmark results show how much faster MySQL Cluster 7.4 is compared to the previous 2 releases (a nice apples-to-apples test but on a very small cluster without any scale-out) but we now have some even more interesting benchmark results showing just how well MySQL Cluster can scale out to deliver incredible results…

200 Million NoSQL NoSQL QPS!

200 Million NoSQL Queries Per Second with MySQL Cluster 7.4

Figure 4: 200 Million NoSQL Queries Per Second

Using the flexAsynch benchmark tool and a pool of 32 data nodes (with each data node running on a dedicated 56 thread Intel E5-2697 v3 (Haswell) machine) the cluster was able to process 200,000,000 reads per second. These results are shown in Figure 4, where the number of data nodes are scaled on the x-axis – as you’ll observe, performance scales virtually linearly with the number of data nodes. Note that MySQL Cluster supports up to 48 data nodes and so there’s still plenty of scope to scale out even further.

2.5 Million SQL Operation per Second

2.5 Million SQL operations per second with MySQL Cluster

Figure 5: 2.5 Million SQL operations per second

Using a smaller cluster (just 16 data nodes running on the same 56 thread machines), the DBT2 Benchmark has been used to assess how well SQL performance scales as more data nodes are added. SQL performance was measured at 2,500,000 SQL read opeations per minute. This equates to around 5 Million Transactions Per Minutes or 2.2 Million NewOnly TPM.

As can be in Figure 5 the scaling of SQL reads is again almost linear.

Active-Active (Multi-Master) Replication

MySQL Cluster allows bi-directional replication between two (or more) clusters. Replication within each cluster is synchronous but between clusters it is asynchronous which means the following scenario is possible:

Conflict with asynchronous replication
Site A Replication Site B
x == 10 x == 10
x = 11 x = 20
— x=11 –> x == 11
x==20 <– x=20 —

 

In this example a value (column for a row in a table) is set to 11 on site A and the change is queued for replication to site B. In the mean time, an application sets the value to 20 on site B and that change is queued for replication to site A. Once both sites have received and applied the replicated change from the other cluster site A contains the value 20 while site B contains 11 – in other words the databases are now inconsistent.

How MySQL Cluster implements eventual consistency

There are two phases to establishing consistency between both clusters after an inconsistency has been introduced:

  1. Detect that a conflict has happened
  2. Resolve the inconsistency

The following animation illustrates how MySQL Cluster 7.2 detects that an inconsistency has been introduced by the asynchronous, active-active replication:

Figure 6: Detecting conflicts

While we typically consider the 2 clusters in an active-active replication configuration to be peers, in this case we designate one to be the primary and the other the secondary. Reads and writes can still be sent to either cluster but it is the responsibility of the primary to identify that a conflict has arisen and then remove the inconsistency.

A logical clock is used to identify (in relative terms) when a change is made on the primary – for those who know something of the MySQL Cluster internals, we use the index of the Global Checkpoint that the update is contained in. For all tables that have this feature turned on, an extra, hidden column is automatically added on the primary – this represents the value of the logical clock when the change was made.

Once the change has been applied on the primary, there is a “window of conflict” for the effected row(s) during which if a different change is made to the same row(s) on the secondary then there will be an inconsistency. Once the slave on the secondary has applied the change from the primary, it will send a replication event back to the slave on the primary, containing the primary’s clock value associated with the changes that have just been applied on the secondary. (Remember that the clock is actually the Global Checkpoint Index and so this feature is sometimes referred to as Reflected GCI). Once the slave on the primary has received this event, it knows that all changes tagged with a clock value no later than the reflected GCI are now safe – the window of conflict has closed.

If an application modifies this same row on the secondary before the replication event from the primary was applied then it will send an associated replication event to the slave on the primary before it reflects the new GCI. The slave on the primary will process this replication event and compare the clock value recorded with the effected rows with the latest reflected GCI; as the clock value for the conflicting row is higher the primary recognises that a conflict has occured and will launch the algorithm to resolve the inconsistency.

Options for MySQL Cluster replication conflict detection/resolution

Figure 7: Options for MySQL Cluster replication conflict detection/resolution

 

After a conflict has been detected, you have the option of having the database simply report the conflict to the application or have it roll back just the conflicting row or the entire transaction and all subsequent transactions that were dependent on it.

So – what’s new in 7.4?

  • Selective tracking of transaction reads for improved transaction dependency tracking
  • Improved NDB Exceptions table format : Non-PK columns, operation type, transaction id, before and after values
  • Improved manageability: Online conflict role change, improved visibility into system state, operations, current and historic consistency

Configuring Active-Active Replication with Transactional Conflict Resolution

MySQL Replication Configuration

Figure 8: MySQL Replication Configuration

This section assumes that replication has already been set up between two clusters as shown in Figure 8. For more details on how to set up that configuration then refer to the blog: Enhanced conflict resolution with MySQL Cluster active-active replication.

To keep things simple, just two hosts are used; 192.168.56.101 contains all nodes for the primary cluster and 192.168.56.102 will contain all nodes for the secondary. A single MySQL Server in each cluster acts as both the master and the slave.

The first step is to identify the tables that need conflict detection enabling. Each of those tables then has to have an entry in the mysql.ndb_replication table where they’re tagged as using the new NDB$EPOCH2_TRANS() function – you could also choose to use NDB$EPOCH(), in which case only the changes to conflicting rows will be rolled-back rather than the full transactions. A few things to note:

  • This must be done before creating the application tables themselves
  • Should only be done on the primary
  • By default the table doesn’t exist and so the very first step is to create it
PRIMARY> CREATE TABLE mysql.ndb_replication (
->     db VARBINARY(63),
->     table_name VARBINARY(63),
->     server_id INT UNSIGNED,
->     binlog_type INT UNSIGNED,
->     conflict_fn VARBINARY(128),
->     PRIMARY KEY USING HASH (db, table_name, server_id)
-> )   ENGINE=NDB
-> PARTITION BY KEY(db,table_name);

PRIMARY> REPLACE INTO mysql.ndb_replication VALUES ('clusterdb', 'simple1', 0, 0,  'NDB$EPOCH2_TRANS()');
PRIMARY> REPLACE INTO mysql.ndb_replication VALUES ('clusterdb', 'simple2', 0, 0,'NDB$EPOCH2_TRANS()');
PRIMARY> REPLACE INTO mysql.ndb_replication VALUES ('clusterdb', 'simple3', 0, 0,'NDB$EPOCH2_TRANS()');

SECONDARY> SELECT * FROM mysql.ndb_replication;
+-----------+------------+-----------+-------------+--------------------+
| db        | table_name | server_id | binlog_type | conflict_fn        |
+-----------+------------+-----------+-------------+--------------------+
| clusterdb | simple2    |         0 |           0 | NDB$EPOCH2_TRANS() |
| clusterdb | simple1    |         0 |           0 | NDB$EPOCH2_TRANS() |
| clusterdb | simple3    |         0 |           0 | NDB$EPOCH2_TRANS() |
+-----------+------------+-----------+-------------+--------------------+

Refer to the MySQL Cluster documentation for details on the contents of the mysql.ndb_replication table.

The next step is to define which MySQL Cluster instance is to act as the PRIMARY and which as the SECONDARY (note that the slave SQL thread must be halted to make this change):

PRIMARY> STOP SLAVE SQL_THREAD;
SECONDARY> STOP SLAVE SQL_THREAD;
PRIMARY> SET GLOBAL ndb_slave_conflict_role='PRIMARY';
SECONDARY> SET GLOBAL ndb_slave_conflict_role='SECONDARY';
SECONDARY> START SLAVE SQL_THREAD;
PRIMARY> START SLAVE SQL_THREAD;

For each of these tables you should also create an exceptions table which will record any conflicts that have resulted in changes being rolled back; the format of these tables is rigidly defined and so take care to copy the types exactly; again this only needs doing on the primary:

PRIMARY> CREATE DATABASE clusterdb;USE clusterdb;

PRIMARY> CREATE TABLE simple1$EX (
NDB$server_id INT UNSIGNED,
NDB$master_server_id INT UNSIGNED,
NDB$master_epoch BIGINT UNSIGNED,
NDB$count INT UNSIGNED,
NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW', 'REFRESH_ROW', 'READ_ROW') NOT NULL,
NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS', 'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL,
NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL,
id INT NOT NULL,
value$OLD INT,
value$NEW INT,
PRIMARY KEY(NDB$server_id, NDB$master_server_id, NDB$master_epoch, NDB$count)) ENGINE=NDB;

PRIMARY> CREATE TABLE simple2$EX (
NDB$server_id INT UNSIGNED,
NDB$master_server_id INT UNSIGNED,
NDB$master_epoch BIGINT UNSIGNED,
NDB$count INT UNSIGNED,
NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW', 'REFRESH_ROW', 'READ_ROW') NOT NULL,
NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS', 'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL,
NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL,
id INT NOT NULL,
value$OLD INT,
value$NEW INT,
PRIMARY KEY(NDB$server_id, NDB$master_server_id, NDB$master_epoch, NDB$count)) ENGINE=NDB;

PRIMARY> CREATE TABLE simple3$EX (
NDB$server_id INT UNSIGNED,
NDB$master_server_id INT UNSIGNED,
NDB$master_epoch BIGINT UNSIGNED,
NDB$count INT UNSIGNED,
NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW', 'REFRESH_ROW', 'READ_ROW') NOT NULL,
NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS', 'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL,
NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL,
id INT NOT NULL,
value$OLD INT,
value$NEW INT,
PRIMARY KEY(NDB$server_id, NDB$master_server_id, NDB$master_epoch, NDB$count)) ENGINE=NDB;

Refer to the MySQL Cluster documentation for details on the format of the exception tables.

Finally, the application tables themselves can be created (this only needs doing on the primary as they’ll be replicated to the secondary):

PRIMARY> CREATE TABLE simple1 (id INT NOT NULL PRIMARY KEY, value INT) ENGINE=ndb;
PRIMARY> CREATE TABLE simple2 (id INT NOT NULL PRIMARY KEY, value INT) ENGINE=ndb;
PRIMARY> CREATE TABLE simple3 (id INT NOT NULL PRIMARY KEY, value INT) ENGINE=ndb;

Everything is now set up and the new configuration can be tested to ensure that conflicts are detected and the correct updates are rolled back.

Testing Active-Active Replication with Transactional Conflict Resolution

The first step is to add some data to our new tables (note that at this point replication is running and so they only need to be created on the primary) and confirm that everything is replicated to the secondary:

PRIMARY> REPLACE INTO simple1 VALUES (1,10);
PRIMARY> REPLACE INTO simple2 VALUES (1,10);
PRIMARY> REPLACE INTO simple3 VALUES (1,10);

SECONDARY> SELECT * FROM simple1;
+----+-------+
| id | value |
+----+-------+
|  1 |    10 |
+----+-------+

SECONDARY> SELECT * FROM simple2;
+----+-------+
| id | value |
+----+-------+
|  1 |    10 |
+----+-------+

SECONDARY> SELECT * FROM simple3;
+----+-------+
| id | value |
+----+-------+
|  1 |    10 |
+----+-------+

It is important that the NDB$EPOCH2_TRANS() function rolls back any transactions on the secondary that involve a conflict (as well as subsequent, dependent transactions that modify the same rows); to intentionally trigger this, the simplest approach is to stop the slave IO thread on the secondary Cluster in order to increase the size of the window of conflict (which is otherwise very short). Once the slave IO thread has been stopped a change is made to table simple1 on the primary and then the secondary makes a (conflicting) change to the same row as well as making a change to table simple2 in the same transaction. A second transaction on the primary will change a row in simple3 – as it doesn’t touch any rows that have been involved in a conflict then that change should stand.

SECONDARY> STOP SLAVE IO_THREAD;

PRIMARY> UPDATE simple1 SET value=13 WHERE id=1;

SECONDARY> BEGIN; # conflicting transaction
SECONDARY> UPDATE simple1 SET value=20 WHERE id=1;
SECONDARY> UPDATE simple2 SET value=20 WHERE id=1;
SECONDARY> COMMIT;
SECONDARY> UPDATE simple3 SET value=20 WHERE id=1; # non conflicting

SECONDARY> SELECT * FROM simple1; SELECT * FROM simple2; SELECT * FROM simple3;
+----+-------+
| id | value |
+----+-------+
|  1 |    20 |
+----+-------+

+----+-------+
| id | value |
+----+-------+
|  1 |    20 |
+----+-------+

+----+-------+
| id | value |
+----+-------+
|  1 |    20 |
+----+-------+

If you now check the exception tables then you can see that the primary has received the changes from the secondary and because the first transaction updated the same row in simple1 during its window of conflict it has recorded that the change needs to be rolled back – this will happen as soon as the replication thread is restarted on the secondary:

PRIMARY> SELECT * FROM simple1$EX\G
*************************** 1. row ***************************
NDB$server_id: 9
NDB$master_server_id: 20
NDB$master_epoch: 31344671326216
NDB$count: 3
NDB$OP_TYPE: UPDATE_ROW
NDB$CFT_CAUSE: TRANS_IN_CONFLICT
NDB$ORIG_TRANSID: 193282978304
id: 1
value$OLD: 10
value$NEW: 20
1 row in set (0.00 sec)

PRIMARY> SELECT * FROM simple2$EX\G
*************************** 1. row ***************************
NDB$server_id: 9
NDB$master_server_id: 20
NDB$master_epoch: 31344671326216
NDB$count: 1
NDB$OP_TYPE: UPDATE_ROW
NDB$CFT_CAUSE: TRANS_IN_CONFLICT
NDB$ORIG_TRANSID: 193282978304
id: 1
value$OLD: 10
value$NEW: 20

PRIMARY> SELECT * FROM simple3$EX\G
Empty set (0.01 sec)

SECONDARY> START SLAVE IO_THREAD;

SECONDARY> SELECT * FROM simple1; SELECT * FROM simple2; SELECT * FROM simple3;
+----+-------+
| id | value |
+----+-------+
|  1 |    13 |
+----+-------+

+----+-------+
| id | value |
+----+-------+
|  1 |    10 |
+----+-------+

+----+-------+
| id | value |
+----+-------+
|  1 |    20 |
+----+-------+

These are the results we expect – simple1 has the value set by the primary with the subsequent change on the secondary rolled back; simple2 was not updated by the primary but the change on the secondary was rolled back as it was made in the same transaction as the conflicting update to simple1. The change on the secondary to simple3 has survived as it was made outside of any conflicting transaction and the change was not dependent on any conflicting changes. Finally just confirm that the data is identical on the primary:

PRIMARY> SELECT * FROM simple1; SELECT * FROM simple2; SELECT * FROM simple3;
+----+-------+
| id | value |
+----+-------+
|  1 |    13 |
+----+-------+

+----+-------+
| id | value |
+----+-------+
|  1 |    10 |
+----+-------+

+----+-------+
| id | value |
+----+-------+
|  1 |    20 |
+----+-------+

Statistics are provided on the primary that record that 1 conflict has been detected, effecting 1 transaction and that it resulted in 2 row changes being rolled back:

PRIMARY> SHOW STATUS LIKE 'ndb_conflict%';
+------------------------------------------+----------------+
| Variable_name                            | Value          |
+------------------------------------------+----------------+
| Ndb_conflict_fn_max                      | 0              |
| Ndb_conflict_fn_old                      | 0              |
| Ndb_conflict_fn_max_del_win              | 0              |
| Ndb_conflict_fn_epoch                    | 0              |
| Ndb_conflict_fn_epoch_trans              | 0              |
| Ndb_conflict_fn_epoch2                   | 0              |
| Ndb_conflict_fn_epoch2_trans             | 1              |
| Ndb_conflict_trans_row_conflict_count    | 1              |
| Ndb_conflict_trans_row_reject_count      | 2              |
| Ndb_conflict_trans_reject_count          | 1              |
| Ndb_conflict_trans_detect_iter_count     | 1              |
| Ndb_conflict_trans_conflict_commit_count | 1              |
| Ndb_conflict_epoch_delete_delete_count   | 0              |
| Ndb_conflict_reflected_op_prepare_count  | 0              |
| Ndb_conflict_reflected_op_discard_count  | 0              |
| Ndb_conflict_refresh_op_count            | 0              |
| Ndb_conflict_last_conflict_epoch         | 37391985278995 |
| Ndb_conflict_last_stable_epoch           | 37417755082760 |
+------------------------------------------+----------------+

SECONDARY> SHOW STATUS LIKE 'ndb_conflict%';
+------------------------------------------+----------------+
| Variable_name                            | Value          |
+------------------------------------------+----------------+
| Ndb_conflict_fn_max                      | 0              |
| Ndb_conflict_fn_old                      | 0              |
| Ndb_conflict_fn_max_del_win              | 0              |
| Ndb_conflict_fn_epoch                    | 0              |
| Ndb_conflict_fn_epoch_trans              | 0              |
| Ndb_conflict_fn_epoch2                   | 0              |
| Ndb_conflict_fn_epoch2_trans             | 0              |
| Ndb_conflict_trans_row_conflict_count    | 0              |
| Ndb_conflict_trans_row_reject_count      | 0              |
| Ndb_conflict_trans_reject_count          | 0              |
| Ndb_conflict_trans_detect_iter_count     | 0              |
| Ndb_conflict_trans_conflict_commit_count | 0              |
| Ndb_conflict_epoch_delete_delete_count   | 0              |
| Ndb_conflict_reflected_op_prepare_count  | 1              |
| Ndb_conflict_reflected_op_discard_count  | 1              |
| Ndb_conflict_refresh_op_count            | 2              |
| Ndb_conflict_last_conflict_epoch         | 32280974196749 |
| Ndb_conflict_last_stable_epoch           | 18897856102416 |
+------------------------------------------+----------------+

These status variables are described in the MySQL Cluster documentation.

Faster Restarts

You can restart MySQL Cluster processes (nodes) without losing database service (for example if adding extra memory to a server) and so on the face of it, the speed of the restarts isn’t that important. Having said that, while the node is restarting you’ve lost some of your high-availability which for super-critical applications can make you nervous. Additionally, faster restarts mean that you can complete maintenance activities faster – for example, a software upgrade requires a rolling restart of all of the nodes – if you have 48 data nodes then you want each of the data nodes to restart as quickly as possible.

MySQL 7.4 includes a number of optimisations to the restart code and so if you’re already using MySQL Cluster, it might be interesting to see how much faster it gets for your application.

Enhanced Reporting

MySQL Cluster presents a lot of monitoring information through the ndbinfo database and in 7.4 we’ve added some extra information on how memory is used for individual tables and how operations are distributed.

Extra Memory Reporting

MySQL Cluster allocates all of the required memory when a data node starts and so any information on memory usage from the operating system is of limited use and provides no clues as to how memory is used with the data nodes – for example, which tables are using the most memory. Also, as this is a distributed database, it is helpful to understand whether a particular table is using a similar amount of memory in each data node (if not then it could be that a better partitioning/sharding key could be used). Finally, when rows are deleted from a table, the memory for those rows would typically remain allocated against that table and so it is helpful to understand how many of these ‘empty slots’ are available for use by new rows in that table. MySQL Cluster 7.4 introduces a new table – ndbinfo.memory_per_fragment – that provides that information.

For example; to see how much memory is being used by each data node for a particular table…

mysql> CREATE DATABASE clusterdb;USE clusterdb;
mysql> CREATE TABLE simples (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=NDB;
mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |      131072 |       5504 |        172 |
|    1 |    2 |      131072 |       1280 |         40 |
|    2 |    0 |      131072 |       5504 |        172 |
|    2 |    2 |      131072 |       1280 |         40 |
|    3 |    1 |      131072 |       3104 |         97 |
|    3 |    3 |      131072 |       4256 |        133 |
|    4 |    1 |      131072 |       3104 |         97 |
|    4 |    3 |      131072 |       4256 |        133 |
+------+------+-------------+------------+------------+

When you delete rows from a MySQL Cluster table, the memory is not actually freed up and so if you check the existing memoryusage table you won’t see a change. This memory will be reused when you add new rows to that same table. In MySQL Cluster 7.4, it’s possible to see how much memory is in that state for a table…

mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |      131072 |       5504 |        172 |
|    1 |    2 |      131072 |       1280 |         40 |
|    2 |    0 |      131072 |       5504 |        172 |
|    2 |    2 |      131072 |       1280 |         40 |
|    3 |    1 |      131072 |       3104 |         97 |
|    3 |    3 |      131072 |       4256 |        133 |
|    4 |    1 |      131072 |       3104 |         97 |
|    4 |    3 |      131072 |       4256 |        133 |
+------+------+-------------+------------+------------+
mysql> DELETE FROM clusterdb.simples LIMIT 1;
mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |      131072 |       5504 |        172 |
|    1 |    2 |      131072 |       1312 |         41 |
|    2 |    0 |      131072 |       5504 |        172 |
|    2 |    2 |      131072 |       1312 |         41 |
|    3 |    1 |      131072 |       3104 |         97 |
|    3 |    3 |      131072 |       4288 |        134 |
|    4 |    1 |      131072 |       3104 |         97 |
|    4 |    3 |      131072 |       4288 |        134 |
+------+------+-------------+------------+------------+

As a final example, we can check whether a table is being evenly sharded accross the data nodes (in this case a realy bad sharding key was chosen)…

mysql> CREATE TABLE simples (id INT NOT NULL AUTO_INCREMENT, \
        species VARCHAR(20) DEFAULT "Human", 
        PRIMARY KEY(id, species)) engine=ndb PARTITION BY KEY(species);

// Add some data

mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |           0 |          0 |          0 |
|    1 |    2 |      196608 |      11732 |        419 |
|    2 |    0 |           0 |          0 |          0 |
|    2 |    2 |      196608 |      11732 |        419 |
|    3 |    1 |           0 |          0 |          0 |
|    3 |    3 |           0 |          0 |          0 |
|    4 |    1 |           0 |          0 |          0 |
|    4 |    3 |           0 |          0 |          0 |
+------+------+-------------+------------+------------+

Extra Operations Reporting

To ensure that resources are being used effectively, it is very helpful to understand the how each table is being access (how frequently and for what types of operations). To support this, the ndbinfo. operations_per_fragment table is provided. For example, the data in this table would let you identify that a large number of full table scans are performed on a particular table.

It is also important to identify if there are any hotspots where a disproportionate share of the queries for a table are hitting a particular fragment/data node. Again, ndbinfo. operations_per_fragment provides this information.

As an example of how to use some of the data from this table, a simple table is created and populated and then ndbinfo.operations_per_fragment is used to monitor how many Primary Key reads and table scans are performed[1]:

mysql> CREATE TABLE simples (id INT AUTO_INCREMENT PRIMARY KEY, time TIMESTAMP) ENGINE=NDB;

mysql> SELECT fq_name AS 'Table', node_id AS 'Data Node', tot_key_reads AS 'Reads', tot_frag_scans AS 'Scans' FROM ndbinfo.operations_per_fragment WHERE fq_name LIKE '%simples';
+-----------------------+-----------+-------+-------+
| Table                 | Data Node | Reads | Scans |
+-----------------------+-----------+-------+-------+
| clusterdb/def/simples |         3 |     0 |     1 |
| clusterdb/def/simples |         3 |     0 |     0 |
| clusterdb/def/simples |         4 |     0 |     0 |
| clusterdb/def/simples |         4 |     0 |     1 |
+-----------------------+-----------+-------+-------+

mysql> REPLACE INTO simples VALUES ();  # Repeated several times
mysql> SELECT * FROM simples;
+----+---------------------+
| id | time                |
+----+---------------------+
|  7 | 2015-01-22 15:12:42 |
…
|  8 | 2015-01-22 15:12:58 |
+----+---------------------+
23 rows in set (0.00 sec)

mysql> SELECT fq_name AS 'Table', node_id AS 'Data Node', tot_key_reads AS 'Reads', tot_frag_scans AS 'Scans' FROM ndbinfo.operations_per_fragment WHERE fq_name LIKE '%simples';
+-----------------------+-----------+-------+-------+
| Table                 | Data Node | Reads | Scans |
+-----------------------+-----------+-------+-------+
| clusterdb/def/simples |         3 |     0 |     2 |
| clusterdb/def/simples |         3 |     0 |     0 |
| clusterdb/def/simples |         4 |     0 |     0 |
| clusterdb/def/simples |         4 |     0 |     2 |
+-----------------------+-----------+-------+-------+

mysql> SELECT * FROM simples WHERE id=11;
+----+---------------------+
| id | time                |
+----+---------------------+
| 11 | 2015-01-22 15:12:59 |
+----+---------------------+

mysql> SELECT fq_name AS 'Table', node_id AS 'Data Node', tot_key_reads AS 'Reads', tot_frag_scans AS 'Scans' FROM ndbinfo.operations_per_fragment WHERE fq_name LIKE '%simples';
+-----------------------+-----------+-------+-------+
| Table                 | Data Node | Reads | Scans |
+-----------------------+-----------+-------+-------+
| clusterdb/def/simples |         3 |     0 |     2 |
| clusterdb/def/simples |         3 |     0 |     0 |
| clusterdb/def/simples |         4 |     0 |     0 |
| clusterdb/def/simples |         4 |     1 |     2 |
+-----------------------+-----------+-------+-------+

Note that there are two rows listed for each data node but only one row for each has non-zero values; this is because each data node holds the primary fragment for one of the partitions and the secondary fragment for the other – all operations are performed only on the active fragments. This is made clearer if the fragment number is included in the query:

mysql> SELECT fq_name AS 'Table', node_id AS 'Data Node', fragment_num AS 'Fragment', tot_key_reads AS 'Reads', tot_frag_scans AS 'Scans' FROM ndbinfo.operations_per_fragment WHERE fq_name LIKE '%simples';
+-----------------------+-----------+----------+-------+-------+
| Table                 | Data Node | Fragment | Reads | Scans |
+-----------------------+-----------+----------+-------+-------+
| clusterdb/def/simples |         3 |        0 |     0 |     2 |
| clusterdb/def/simples |         3 |        1 |     0 |     0 |
| clusterdb/def/simples |         4 |        0 |     0 |     0 |
| clusterdb/def/simples |         4 |        1 |     1 |     2 |
+-----------------------+-----------+----------+-------+-------+

Conclusion

We’re really excited about the GA for this new MySQL Cluster release; if you get chance to try it out then please let us know how you get on – either through a comment on this blog, a MySQL bug report or a post to the MySQL Cluster Forum.





Choosing the right MySQL High Availability Solution – webinar replay

MySQL-HA-Logo1.pngEarlier this week, I presented a webinar on MySQL High Availability options for MySQL – what they are and how to choose the most appropriate one for your application.

The replay of this webinar can now be viewed here or if you just want to look at the charts then scroll down. At the end of this post, I include a summary of the Q&A from the webinar.

How important is your data? Can you afford to lose it? What about just some of it? What would be the impact if you couldn’t access it for a minute, an hour, a day or a week?

Different applications can have very different requirements for High Availability. Some need 100% data reliability with 24x7x365 read & write access while many others are better served by a simpler approach with more modest HA ambitions.

MySQL has an array of High Availability solutions ranging from simple backups, through replication and shared storage clustering – all the way up to 99.999% available shared nothing, geographically replicated clusters. These solutions also have different ‘bonus’ features such as full InnoDB compatibility, in-memory real-time performance, linear scalability and SQL & NoSQL APIs.

The purpose of this webinar is to help you decide where your application sits in terms of HA requirements and discover which of the MySQL solutions best fit the bill. It will also cover what you need outside of the database to ensure High Availability – state of the art monitoring being a prime example.

The charts

 

Questions and Answers

  • What does “HA” stands for? High Availability
  • What is meant by scale-out? Scale-out is where you increase the capacity or the throughput of the system by adding extra (usually commodity) machines – this is the opposite of scale-up where you buy the biggest single server that you can find. It tends to be much more economical this way and you can add extra capacity if and when you need it.
  • Most of my applications are for small businesses were the data load is not that big, is this session worth listening too? Yes – the focus of this presentation is on keeping your data safe and accessible, not on scaling to massive volumes and throughput. Scale-out is touched on as if you need that as well then it can influence your choice of High Availability solution
  • How does all this compare with Amazon Aurora? Aurora is something that’s offered by Amazon and is supported by them. Aurora can only be run on AWS – the solutions covered here can be taken to lots of different environments – whether cloud or ‘bare metal’
  • what amount of work would be involved in migrating actual InnoDB based DB’s into NDB? As you’d expect, it depends on the application. Changes should be minor to get it to run but you might need to make more adjustments to get the best performance. This white paper is a good place to start.
  • Is it possible to migrate InnoDB data to MySQL Cluster? Yes. The first thing to note is that to use MySQL Cluster, you have to use the mysqld (MySQL Server) process that comes with the MySQL Cluster package; once you’ve switched to that binary then you can run ALTER TABLE my_tab ENGINE=NDB; provided that your schema is compatible with MySQL Cluster. Alternatively you can perform a mysqldump and then load the data in.
  • Does MySQL Fabric solutions support “rolling upgrade”? Yes – you can perform a rolling upgrade of the managed MySQL Servers but you have to do it yourself, Fabric doesn’t automate it at present
  • What about master-master replication? You can use active-active replication but the application is responsibe for avoiding conflicts between the two sites unless you use MySQL Cluster/NDB (where the functionality is built into the database)
  • What’s the maximum distance that the master and slave can be apart? For MySQL Replication, there is no maximum distance; the latency of the master isn’t impacted if you stick with the default asynchronous replication. If you use semi-synchronous replication then the latency of your transactions will be impacted by the WAN latency between the master and slave site(s).
  • Is there a monetary investment to implement this or is it free? The database technology presented in this session can be used under the GPL open source license; if you buy a commercal edition then you get access to some additional tools such as MySQL Enterprise Monitor and MySQL Cluster Manager.
  • Can we check the transaction sequence in slave side? Yes, with MySQL replication, you can check which transactions have been applied on the slave.
  • Can the slave switch over to master, in case the master needs to be shut down? Yes – that’s a very common use case
  • Do we have seperate binlog for different databases (schemas)? All of the databases (schemas) within the MySQL Server use the same binary log. When that log file fills up, it’s rotated out and a new one used.
  • How can I implement “auto-failover” in mysql 5.6 replication? Are you talking abaout “mysqlfailover” script? You can use the mysqlfailover utility or MySQL Fabric
  • What are the similarties and differences between InnoDB and the NDB/MySQL Cluster engines? We’ll cover some of this in this presentation and you can find more details in this white paper
  • With master-master replication, there were some parameters to control the autoincrement IDs so that we could avoid conflicts with active-active MySQL replication? If you ensure that the same row isn’t written to on both masters then you can avoid conflicts. One option could be to store odd primary keys on one and even on the other. You can then set the auto_increment_increment and auto_increment_offset parameters and then use auto-increment primary keys on each MySQL Server
  • Is there an online backup tool available in mysql – without locking my database? Yes – if you’re using InnoDB then you can use MySQL Enterprise Backup (part of MySQL Enterprise Edition and it’s also much faster than mysqldump); MySQL Cluster it has a built-in online backup tool (which can be simpler to use in conjunction with MySQL Cluter Manager which is part of MySQL Cluster Carrier Grade Edition)
  • In multi-source replication, how are data conflicts handled? If more than one master modifies the same row, which one is applied? That’s left as an exercise for the application. The application is responsible for making sure that there are no conflicting updates (if you care about the conflicts). You could also use MySQL Fabric to shard the data.
  • With MySQL Group Replication, if one master is down, will transactions still be applied? With MySQL Group Replication, updates can be sent to any of the servers and so if one is unavailable just switch to any of the others
  • what is the maximum data size MySQL can handle in it’s latest release? The maximum size of an InnoDB table[space] is four billion pages (64TB with the default 16k page size). Beyond that you start partitioning or use MySQL Fabric
  • Can you give a short definition for sharding? Sharding is where you take the data for a table and split it accross multiple MySQL Server instances. Typically you’ll choose one or more columns from the table to act as the sharding key to decide which shard a specific row should be stored in. You can also have functional sharding where you decide that the whole of table A will be in one shard and all of table B in another
  • Is there a storage engine to handle JSON documents? Here’s an interesting blog on that topic
  • What algorithms are available for sharding in MySQL? With MySQL Fabric you can use a hash or define ranges for the shard key. With MySQL Cluster, the sharding is completely transparrent to the application but it uses a MD5 hashing under the covers
  • In MySQL cluster is all the data replicated to all of the data nodes? Data is synchrously replicated between the 2 data nodes forming a node group. Different node groups are responsible for different shards for any given table
  • What’s the maximum number of data nodes in MySQL Cluster? 48
  • Which it best inter-connect method between MySQL Cluster data nodes for highest performance? Infiniband and GB Ethernet have both shown great results. If you’re able to configure the behaviour then low latency will help get the best performance
  • Can I colocate MySQL Cluster data nodes with MySQL Servers in order use all posible resources? Yes, you can co-locate data nodes and MySQL Servers. Note that the management node (ndb_mgmd) should not run on the same machine as any of the data nodes – read this post for details on how to deploy MySQL Cluster nodes for High Availability




MySQL Replication: What’s New in MySQL 5.7 and Beyond – webinar replay

MySQL Replication LogoLuís Soares and I recently hosted a webinar which explained the latest developments in MySQL Replication.

The webinar replay is now available to download from here.

Details:

Continuing in the footsteps of its predecessor, MySQL 5.7 is set to be a groundbreaking release. In this webinar, the engineers behind the product provide insights into what’s new for MySQL replication in the latest 5.7 Development Milestone Release and review the early access features available via labs.mysql.com. The next generation of replication features cover several technical areas such as better semi-synchronous replication, an enhanced multithreaded slave (per-transaction parallelism), improved monitoring with performance schema tables, online configuration changes, options for fine-tuning replication performance, support for more-advanced topologies with multisource replication, and much more. This is also a great chance to learn about MySQL Group Replication – the next generation of active-active, update-anywhere replication for MySQL.

Seize the opportunity to learn how you will be able to leverage MySQL 5.7 replication to grow your business.





Upcoming Webinar – MySQL Replication: What’s New in MySQL 5.7 and Beyond

MySQL Replication LogoOn Tuesday 25th November, Luís Soares and I will be hosting a webinar which explains the latest developments in MySQL Replication. As always the webinar is free but please register here.

Details:

Continuing in the footsteps of its predecessor, MySQL 5.7 is set to be a groundbreaking release. In this webinar, the engineers behind the product provide insights into what’s new for MySQL replication in the latest 5.7 Development Milestone Release and review the early access features available via labs.mysql.com. The next generation of replication features cover several technical areas such as better semi-synchronous replication, an enhanced multithreaded slave (per-transaction parallelism), improved monitoring with performance schema tables, online configuration changes, options for fine-tuning replication performance, support for more-advanced topologies with multisource replication, and much more. This is also a great chance to learn about MySQL Group Replication – the next generation of active-active, update-anywhere replication for MySQL.

Seize the opportunity to learn how you will be able to leverage MySQL 5.7 replication to grow your business.

When:

  • Tue, Nov 25: 09:00 Pacific time (America)
  • Tue, Nov 25: 10:00 Mountain time (America)
  • Tue, Nov 25: 11:00 Central time (America)
  • Tue, Nov 25: 12:00 Eastern time (America)
  • Tue, Nov 25: 15:00 São Paulo time
  • Tue, Nov 25: 17:00 UTC
  • Tue, Nov 25: 17:00 Western European time
  • Tue, Nov 25: 18:00 Central European time
  • Tue, Nov 25: 19:00 Eastern European time
  • Tue, Nov 25: 22:30 India, Sri Lanka
  • Wed, Nov 26: 01:00 Singapore/Malaysia/Philippines time
  • Wed, Nov 26: 01:00 China time
  • Wed, Nov 26: 02:00 日本
  • Wed, Nov 26: 04:00 NSW, ACT, Victoria, Tasmania (Australia)

Even if you can’t join the live webinar, it’s worth registering as you’ll be emailed a link to the replay as soon as it’s available.





Active-Active Replication, Performance Improvements & Operational Enhancements – some of what’s available in the new MySQL Cluster 7.4.2 DMR

MySQL Cluster Logo

Oracle have just made availble the new MySQL Cluster 7.4.2 Development Milestone Release – it can be downloaded from the development release tab here. Note that this is not a GA release and so we wouldn’t recommend using it in production.

This is the second DMR for MySQL 7.4; the delta between this DMR and 7.4.1 can be viewed in the MySQL Cluster 7.4.2 Release Notes

There are three main focus areas for this DMR and the purpose of this post is to briefly introduce them:

  • Active-Active (Multi-Master) Replication
  • Performance
  • Operational improvements (speeding up of restarts; enhanced memory reporting)

Active-Active (Multi-Master) Replication

MySQL Cluster allows bi-directional replication between two (or more) clusters. Replication within each cluster is synchronous but between clusters it is asynchronous which means the following scenario is possible:

Conflict with asynchronous replication
Site A Replication Site B
x == 10 x == 10
x = 11 x = 20
— x=11 –> x == 11
x==20 <– x=20 —

 

In this example a value (column for a row in a table) is set to 11 on site A and the change is queued for replication to site B. In the mean time, an application sets the value to 20 on site B and that change is queued for replication to site A. Once both sites have received and applied the replicated change from the other cluster site A contains the value 20 while site B contains 11 – in other words the databases are now inconsistent.

How MySQL Cluster implements eventual consistency

There are two phases to establishing consistency between both clusters after an inconsistency has been introduced:

  1. Detect that a conflict has happened
  2. Resolve the inconsistency

The following animation illustrates how MySQL Cluster 7.2 detects that an inconsistency has been introduced by the asynchronous, active-active replication:

Detecting conflicts

While we typically consider the 2 clusters in an active-active replication configuration to be peers, in this case we designate one to be the primary and the other the secondary. Reads and writes can still be sent to either cluster but it is the responsibility of the primary to identify that a conflict has arisen and then remove the inconsistency.

A logical clock is used to identify (in relative terms) when a change is made on the primary – for those who know something of the MySQL Cluster internals, we use the index of the Global Checkpoint that the update is contained in. For all tables that have this feature turned on, an extra, hidden column is automatically added on the primary – this represents the value of the logical clock when the change was made.

Once the change has been applied on the primary, there is a “window of conflict” for the effected row(s) during which if a different change is made to the same row(s) on the secondary then there will be an inconsistency. Once the slave on the secondary has applied the change from the primary, it will send a replication event back to the slave on the primary, containing the primary’s clock value associated with the changes that have just been applied on the secondary. (Remember that the clock is actually the Global Checkpoint Index and so this feature is sometimes referred to as Reflected GCI). Once the slave on the primary has received this event, it knows that all changes tagged with a clock value no later than the reflected GCI are now safe – the window of conflict has closed.

If an application modifies this same row on the secondary before the replication event from the primary was applied then it will send an associated replication event to the slave on the primary before it reflects the new GCI. The slave on the primary will process this replication event and compare the clock value recorded with the effected rows with the latest reflected GCI; as the clock value for the conflicting row is higher the primary recognises that a conflict has occured and will launch the algorithm to resolve the inconsistency.

Options for MySQL Cluster replication conflict detection/resolution

After a conflict has been detected, you have the option of having the database simply report the conflict to the application or have it roll back just the conflicting row or the entire transaction and all subsequent transactions that were dependent on it.

So – what’s new in 7.4.1?

  • Detects conflicts between inserts and updates
  • Option to roll back entire transaction (and dependent transactions) rather than just the conflicting row
  • All conflicts are handled before switching primary – avoiding potential race conditions

As mentioned at the start of this post, this is pre-GA and there are some extra enhancements we plan on including in the final version:

  • Handle deletes which conflict with other operations
  • Roll back transactions that have read a row that had been rolled back due to a conflict

Performance

MySQL CLuster 7.4.1 Read-Write Performance
Being a scaled-out, in-memory, real-time database, MySQL Cluster performance has always been great but we continue to work on making it faster each release. In particular, we want to keep pace with the trend of having more and more cores rather than faster ones. 7.4 continues along the path of better exploiting multiple cores – as can be seen from these benchmark results.
MySQL CLuster 7.4.1 Read Performance
Just make sure that you’re using the multi-threaded data node (ndbmtd rather than ndbd) and have configured how many threads it should use.

Faster Restarts

You can restart MySQL Cluster processes (nodes) without losing database service (for example if adding extra memory to a server) and so on the face of it, the speed of the restarts isn’t that important. Having said that, while the node is restarting you’ve lost some of your high-availability which for super-critical applications can make you nervous. Additionally, faster restarts mean that you can complete maintenance activities faster – for example, a software upgrade requires a rolling restart of all of the nodes – if you have 48 data nodes then you want each of the data nodes to restart as quickly as possible.

MySQL 7.4.1 includes a number of optimisations to the restart code and so if you’re already using MySQL Cluster, it might be interesting to see how much faster it gets for your application. We also have some extra optimisations in the works that you can expect to see in later 7.4 versions.

Extra Memory Reporting

MySQL Cluster presents a lot of monitoring information through the ndbinfo database and in 7.4 we’ve added some extra information on how memory is used for individual tables.

For example; to see how much memory is being used by each data node for a particular table…

mysql> CREATE DATABASE clusterdb;USE clusterdb;
mysql> CREATE TABLE simples (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=NDB;
mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |      131072 |       5504 |        172 |
|    1 |    2 |      131072 |       1280 |         40 |
|    2 |    0 |      131072 |       5504 |        172 |
|    2 |    2 |      131072 |       1280 |         40 |
|    3 |    1 |      131072 |       3104 |         97 |
|    3 |    3 |      131072 |       4256 |        133 |
|    4 |    1 |      131072 |       3104 |         97 |
|    4 |    3 |      131072 |       4256 |        133 |
+------+------+-------------+------------+------------+

When you delete rows from a MySQL Cluster table, the memory is not actually freed up and so if you check the existing memoryusage table you won’t see a change. This memory will be reused when you add new rows to that same table. In MySQL Cluster 7.4, it’s possible to see how much memory is in that state for a table…

mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |      131072 |       5504 |        172 |
|    1 |    2 |      131072 |       1280 |         40 |
|    2 |    0 |      131072 |       5504 |        172 |
|    2 |    2 |      131072 |       1280 |         40 |
|    3 |    1 |      131072 |       3104 |         97 |
|    3 |    3 |      131072 |       4256 |        133 |
|    4 |    1 |      131072 |       3104 |         97 |
|    4 |    3 |      131072 |       4256 |        133 |
+------+------+-------------+------------+------------+
mysql> DELETE FROM clusterdb.simples LIMIT 1;
mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |      131072 |       5504 |        172 |
|    1 |    2 |      131072 |       1312 |         41 |
|    2 |    0 |      131072 |       5504 |        172 |
|    2 |    2 |      131072 |       1312 |         41 |
|    3 |    1 |      131072 |       3104 |         97 |
|    3 |    3 |      131072 |       4288 |        134 |
|    4 |    1 |      131072 |       3104 |         97 |
|    4 |    3 |      131072 |       4288 |        134 |
+------+------+-------------+------------+------------+

As a final example, we can check whether a table is being evenly sharded accross the data nodes (in this case a realy bad sharding key was chosen)…

mysql> CREATE TABLE simples (id INT NOT NULL AUTO_INCREMENT, \
        species VARCHAR(20) DEFAULT "Human", 
        PRIMARY KEY(id, species)) engine=ndb PARTITION BY KEY(species);

// Add some data

mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |           0 |          0 |          0 |
|    1 |    2 |      196608 |      11732 |        419 |
|    2 |    0 |           0 |          0 |          0 |
|    2 |    2 |      196608 |      11732 |        419 |
|    3 |    1 |           0 |          0 |          0 |
|    3 |    3 |           0 |          0 |          0 |
|    4 |    1 |           0 |          0 |          0 |
|    4 |    3 |           0 |          0 |          0 |
+------+------+-------------+------------+------------+

If you get chance to try out this new release then please let us know how you get on – either through a comment on this blog, a MySQL bug report or a post to the MySQL Cluster Forum.





Active-Active Replication, Performance Improvements & Operational Enhancements – some of what’s available in the new MySQL Cluster 7.4.1 DMR

MySQL Cluster Logo

Oracle have just made availble the new MySQL Cluster 7.4.1 Development Milestone Release – it can be downloaded from the development release tab here. Note that this is not a GA release and so we wouldn’t recommend using it in production.

There are three main focus areas for this DMR and the purpose of this post is to briefly introduce them:

  • Active-Active (Multi-Master) Replication
  • Performance
  • Operational improvements (speeding up of restarts; enhanced memory reporting)

Active-Active (Multi-Master) Replication

MySQL Cluster allows bi-directional replication between two (or more) clusters. Replication within each cluster is synchronous but between clusters it is asynchronous which means the following scenario is possible:

Conflict with asynchronous replication
Site A Replication Site B
x == 10 x == 10
x = 11 x = 20
— x=11 –> x == 11
x==20 <– x=20 —

 

In this example a value (column for a row in a table) is set to 11 on site A and the change is queued for replication to site B. In the mean time, an application sets the value to 20 on site B and that change is queued for replication to site A. Once both sites have received and applied the replicated change from the other cluster site A contains the value 20 while site B contains 11 – in other words the databases are now inconsistent.

How MySQL Cluster implements eventual consistency

There are two phases to establishing consistency between both clusters after an inconsistency has been introduced:

  1. Detect that a conflict has happened
  2. Resolve the inconsistency

The following animation illustrates how MySQL Cluster 7.2 detects that an inconsistency has been introduced by the asynchronous, active-active replication:

Detecting conflicts

While we typically consider the 2 clusters in an active-active replication configuration to be peers, in this case we designate one to be the primary and the other the secondary. Reads and writes can still be sent to either cluster but it is the responsibility of the primary to identify that a conflict has arisen and then remove the inconsistency.

A logical clock is used to identify (in relative terms) when a change is made on the primary – for those who know something of the MySQL Cluster internals, we use the index of the Global Checkpoint that the update is contained in. For all tables that have this feature turned on, an extra, hidden column is automatically added on the primary – this represents the value of the logical clock when the change was made.

Once the change has been applied on the primary, there is a “window of conflict” for the effected row(s) during which if a different change is made to the same row(s) on the secondary then there will be an inconsistency. Once the slave on the secondary has applied the change from the primary, it will send a replication event back to the slave on the primary, containing the primary’s clock value associated with the changes that have just been applied on the secondary. (Remember that the clock is actually the Global Checkpoint Index and so this feature is sometimes referred to as Reflected GCI). Once the slave on the primary has received this event, it knows that all changes tagged with a clock value no later than the reflected GCI are now safe – the window of conflict has closed.

If an application modifies this same row on the secondary before the replication event from the primary was applied then it will send an associated replication event to the slave on the primary before it reflects the new GCI. The slave on the primary will process this replication event and compare the clock value recorded with the effected rows with the latest reflected GCI; as the clock value for the conflicting row is higher the primary recognises that a conflict has occured and will launch the algorithm to resolve the inconsistency.

Options for MySQL Cluster replication conflict detection/resolution

After a conflict has been detected, you have the option of having the database simply report the conflict to the application or have it roll back just the conflicting row or the entire transaction and all subsequent transactions that were dependent on it.

So – what’s new in 7.4.1?

  • Detects conflicts between inserts and updates
  • Option to roll back entire transaction (and dependent transactions) rather than just the conflicting row
  • All conflicts are handled before switching primary – avoiding potential race conditions

As mentioned at the start of this post, this is pre-GA and there are some extra enhancements we plan on including in the final version:

  • Handle deletes which conflict with other operations
  • Roll back transactions that have read a row that had been rolled back due to a conflict

Performance

MySQL CLuster 7.4.1 Read-Write Performance
Being a scaled-out, in-memory, real-time database, MySQL Cluster performance has always been great but we continue to work on making it faster each release. In particular, we want to keep pace with the trend of having more and more cores rather than faster ones. 7.4 continues along the path of better exploiting multiple cores – as can be seen from these benchmark results.
MySQL CLuster 7.4.1 Read Performance
Just make sure that you’re using the multi-threaded data node (ndbmtd rather than ndbd) and have configured how many threads it should use.

Faster Restarts

You can restart MySQL Cluster processes (nodes) without losing database service (for example if adding extra memory to a server) and so on the face of it, the speed of the restarts isn’t that important. Having said that, while the node is restarting you’ve lost some of your high-availability which for super-critical applications can make you nervous. Additionally, faster restarts mean that you can complete maintenance activities faster – for example, a software upgrade requires a rolling restart of all of the nodes – if you have 48 data nodes then you want each of the data nodes to restart as quickly as possible.

MySQL 7.4.1 includes a number of optimisations to the restart code and so if you’re already using MySQL Cluster, it might be interesting to see how much faster it gets for your application. We also have some extra optimisations in the works that you can expect to see in later 7.4 versions.

Extra Memory Reporting

MySQL Cluster presents a lot of monitoring information through the ndbinfo database and in 7.4 we’ve added some extra information on how memory is used for individual tables.

For example; to see how much memory is being used by each data node for a particular table…

mysql> CREATE DATABASE clusterdb;USE clusterdb;
mysql> CREATE TABLE simples (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=NDB;
mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |      131072 |       5504 |        172 |
|    1 |    2 |      131072 |       1280 |         40 |
|    2 |    0 |      131072 |       5504 |        172 |
|    2 |    2 |      131072 |       1280 |         40 |
|    3 |    1 |      131072 |       3104 |         97 |
|    3 |    3 |      131072 |       4256 |        133 |
|    4 |    1 |      131072 |       3104 |         97 |
|    4 |    3 |      131072 |       4256 |        133 |
+------+------+-------------+------------+------------+

When you delete rows from a MySQL Cluster table, the memory is not actually freed up and so if you check the existing memoryusage table you won’t see a change. This memory will be reused when you add new rows to that same table. In MySQL Cluster 7.4, it’s possible to see how much memory is in that state for a table…

mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |      131072 |       5504 |        172 |
|    1 |    2 |      131072 |       1280 |         40 |
|    2 |    0 |      131072 |       5504 |        172 |
|    2 |    2 |      131072 |       1280 |         40 |
|    3 |    1 |      131072 |       3104 |         97 |
|    3 |    3 |      131072 |       4256 |        133 |
|    4 |    1 |      131072 |       3104 |         97 |
|    4 |    3 |      131072 |       4256 |        133 |
+------+------+-------------+------------+------------+
mysql> DELETE FROM clusterdb.simples LIMIT 1;
mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |      131072 |       5504 |        172 |
|    1 |    2 |      131072 |       1312 |         41 |
|    2 |    0 |      131072 |       5504 |        172 |
|    2 |    2 |      131072 |       1312 |         41 |
|    3 |    1 |      131072 |       3104 |         97 |
|    3 |    3 |      131072 |       4288 |        134 |
|    4 |    1 |      131072 |       3104 |         97 |
|    4 |    3 |      131072 |       4288 |        134 |
+------+------+-------------+------------+------------+

As a final example, we can check whether a table is being evenly sharded accross the data nodes (in this case a realy bad sharding key was chosen)…

mysql> CREATE TABLE simples (id INT NOT NULL AUTO_INCREMENT, \
        species VARCHAR(20) DEFAULT "Human", 
        PRIMARY KEY(id, species)) engine=ndb PARTITION BY KEY(species);

// Add some data

mysql> SELECT node_id AS node, fragment_num AS frag, \
        fixed_elem_alloc_bytes alloc_bytes, \
        fixed_elem_free_bytes AS free_bytes, \
        fixed_elem_free_rows AS spare_rows \
        FROM ndbinfo.memory_per_fragment \
        WHERE fq_name LIKE '%simples%';
+------+------+-------------+------------+------------+
| node | frag | alloc_bytes | free_bytes | spare_rows |
+------+------+-------------+------------+------------+
|    1 |    0 |           0 |          0 |          0 |
|    1 |    2 |      196608 |      11732 |        419 |
|    2 |    0 |           0 |          0 |          0 |
|    2 |    2 |      196608 |      11732 |        419 |
|    3 |    1 |           0 |          0 |          0 |
|    3 |    3 |           0 |          0 |          0 |
|    4 |    1 |           0 |          0 |          0 |
|    4 |    3 |           0 |          0 |          0 |
+------+------+-------------+------------+------------+

If you get chance to try out this new release then please let us know how you get on – either through a comment on this blog, a MySQL bug report or a post to the MySQL Cluster Forum.





Advanced MySQL Replication Architectures and Latest Developments – free webinar

MySQL Replication Logo
This Thursday (20th March 2014) we’ll be hosted a free webinar covering advanced MySQL Replication topics as well as the latest developments. As always, the webinar is free but you need to register here – even if you can’t join live, you’ll then be sent a link to the replay.

More details on what to expect…

The biggest Web sites in the world rely on MySQL Replication to scale-out and provide High Availability for their data. Extend your knowledge of how MySQL Replication works and what you can achieve with it; join us for this technical webinar to explore some of the more advanced replication architectures as well as some of the latest product developments:

  • Replication topologies, including master-slave, circular and multi-master
  • Load balancing and query splitting
  • Data aggregation with multi-source replication
  • Global Transaction IDs and auto-failover with recovery
  • Getting the best replication throughput
  • Heterogeneous replication with the Binlog API

WHO:

  • Andrew Morgan, Principal MySQL Product Manager
  • Lars Thalmann, Director, MySQL Replication, Backup and Connectors

WHEN:

  • Thu, Mar 20: 09:00 Pacific time (America)
  • Thu, Mar 20: 10:00 Mountain time (America)
  • Thu, Mar 20: 11:00 Central time (America)
  • Thu, Mar 20: 12:00 Eastern time (America)
  • Thu, Mar 20: 13:00 São Paulo time
  • Thu, Mar 20: 16:00 UTC
  • Thu, Mar 20: 16:00 Western European time
  • Thu, Mar 20: 17:00 Central European time
  • Thu, Mar 20: 18:00 Eastern European time
  • Thu, Mar 20: 21:30 India, Sri Lanka
  • Fri, Mar 21: 00:00 Singapore/Malaysia/Philippines time
  • Fri, Mar 21: 00:00 China time
  • Fri, Mar 21: 01:00 日本
  • Fri, Mar 21: 03:00 NSW, ACT, Victoria, Tasmania (Australia)




MySQL Cluster Asynchronous Replication – conflict detection & resolution

I was rooting through past blog entries and I stumbled accross a draft post on setting up multi-master (update anywhere) asynchronous replication for MySQL Cluster. The post never quite got finished and published and while the material is now 4 years old it may still be helpfull to some and so I’m posting it now. Note that a lot has happened with MySQL Cluster in the last 4 years and in this area, the most notable change has been the Enhanced conflict resolution with MySQL Cluster active-active replication feature introduced in MySQL Cluster 7.2 and if you’re only dealing with a pair of Clusters, that’s your best option as it removed the need for you to maintain the timestamp columns and backs out entire transactions rather than just the conflicting rows. So when would you use this “legacy” method? The main use case is when you want conflict detection/resolution among a ring of more than 2 Clusters. Note also that MySQL 5.6 (and so MySQL Cluster 7.3) added microsecond precision to timestamps and so you may not need the custom plugin that this post referred to.

Anyway, here’s the original post…

————————————————————
MySQL Cluster asynchronous replication allows you to run in a multi-master mode with the application making changes to both sites (or more than 2 sites using a replication ring). As the replication is asynchronous, if the application(s) modified the same row on both sites at ‘about the same time’ then there is a potential for a collision. Left to their own devices, each site would store (and provide to the application) different data indefinitely. This article explains how to use MySQL Cluster collision detection and resolution to cope with this.

Fig. 1 Multi-master replication leading to inconsistencies

Fig. 1 Multi-master replication leading to inconsistencies

Fig. 1 shows the timeline that can result in a conflict. The same or two different applications make a change to the same row in the table but to the 2 different instances of MySQL Cluster. Each cluster synchronously replicates the data amongst its local data node in order to provide local High Availability (everything there is safe!). At some point later (normally a fraction of a second), the changes are replicated to the remote site asynchronously – this delay opens a window for a conflict where Cluster 2 is updated by the application just before it receives the earlier update from Cluster 1. Cluster 2 will overwrite it’s row with the value (5) it has received but only after its earlier change (directly from the application) is written to the binary log ready for replication to Cluster 1 which in turn will cause that value (15) to be stored by Cluster 1. Each Cluster instance replicates what it believes to be the correct data to the other site – overwriting what that site had previously stored. In our example, that leaves one database holding the value 15 for key ‘A’ while the other stores 5.

It’s often the case that the application will tend to go to the same site during a particular time when accessing the same data and so the chances of a conflict are reduced but the application may still want to guard against (even rare) race conditions. If replication slows down (for example due to a backlog of updates to be applied) or stops temporarily (for example due to network failure to the geographically remote site) then the chances of a collision greatly increase.

For information on setting up multi-master asynchronous replication with MySQL Cluster, please take a look at Setting up MySQL Asynchronous Replication for High Availability.

Conflict Detection & Resolution using MySQL Cluster

MySQL Cluster provides two different schemes to handle these collisions/conflicts. The first scheme (referred to as “greatest timestamp wins”) detects that a conflict occurs and automatically resolves it (the change most recently received from the application is stored on both Clusters). The second scheme (referred to as “same value wins”) detects that a conflict has occurred but does not fix it – instead the conflict is recorded in such a way that the application (or user) can figure out how best to resolve it based on a full understanding of the schema, what the data means and how it’s used. It is up to the developer which approach they use (if any) – it is selected on a per-table basis.

Common prerequisite steps

These steps should be followed regardless of whether you want to use conflict resolution or conflict detection (where the application decides how to resolve it).

  1. Set up multi-master replication as described in Setting up MySQL Asynchronous Replication for High Availability
  2. Create the function “inttime” for use in the stored procedures as described in Creating a MySQL plugin to produce an integer timestamp Note that you will need to install inttime.so on each host

Setting up Automatic Conflict Resolution (Greatest timestamp wins)

This is the simplest way to handle conflicts with MySQL Cluster when implementing multi-master asynchronous replication (actually, the simplest is to do nothing and accept that if your application(s) update the same row at about the same time at both Clusters then those Clusters may be left with different data until the application(s) next update that row).

Remember that this mechanism works by checking that the timestamp field of the update received by the slave is later than the one already stored. In the example that follows, the ‘ts’ column is used for the timestamp.

Create the database on either cluster (replication will make sure that it appears in both Clusters):

mysql> create database clusterdb;

Before creating the application tables, set  up the ndb_replication system table (again, in either Cluster):

mysql> CREATE TABLE mysql.ndb_replication ( db VARBINARY(63), table_name VARBINARY(63), server_id INT UNSIGNED, binlog_type INT UNSIGNED, conflict_fn VARBINARY(128), PRIMARY KEY USING HASH (db, table_name, server_id) ) ENGINE=NDB PARTITION BY KEY(db,table_name);

mysql> insert into mysql.ndb_replication values ('clusterdb', 'tab1', 7, NULL, 'NDB$MAX(ts)');

After that, you can create the application table:

cluster1 mysql> use clusterdb;

cluster1 mysql> create table tab1 (NAME varchar(30) not null primary key,VALUE int, ts BIGINT UNSIGNED default NULL) engine=ndb;

To test that the basic replication is working for this table, insert a row into cluster1, check it’s there in cluster2, add a second row to cluster2 and make sure it’s visible in cluster1:

cluster1 mysql> insert into tab1 values ('Frederick', 1, 0);

cluster2 mysql> use clusterdb;
cluster2 mysql> select * from tab1;
+-----------+-------+------+
| NAME      | VALUE | ts   |
+-----------+-------+------+
| Frederick |     1 |    0 |
+-----------+-------+------+
1 row in set (0.00 sec)
cluster2 mysql> insert into tab1 values ('William',20,0);

cluster1 mysql> select * from tab1;
+-----------+-------+------+
| NAME      | VALUE | ts   |
+-----------+-------+------+
| Frederick |     1 |    0 |
| William   |    20 |    0 |
+-----------+-------+------+
2 rows in set (0.00 sec)

For both rows, the timestamp was set to 0 to represent ‘the start of time’, from this point on, whenever making a change to those rows, the timestamp should be increased. Later on on in this article, I’ll show how to automate that process.

We’re now ready to test that the conflict resolution is working; to do so replication is stopped (in both directions) to increase the window for a conflict and the same tuple updated on each Cluster. Replication is then restarted and then I’ll confirm that the last update wins on both clusters:

cluster1 mysql> slave stop;

cluster2 mysql> slave stop;

cluster1 mysql> update tab1 set VALUE=10,ts=1 where NAME='Frederick';

cluster2 mysql> update tab1 set VALUE=11,ts=2 where NAME='Frederick';

cluster1 mysql> slave start;

cluster2 mysql> slave start;

cluster1 mysql> select * from tab1;
+-----------+-------+------+
| NAME      | VALUE | ts   |
+-----------+-------+------+
| William   |    20 |    0 |
| Frederick |    11 |    2 |
+-----------+-------+------+
2 rows in set (0.00 sec)

clusrer2 mysql> select * from tab3;
+-----------+-------+------+
| NAME      | VALUE | ts   |
+-----------+-------+------+
| William   |    20 |    0 |
| Frederick |    11 |    2 |
+-----------+-------+------+
2 rows in set (0.00 sec)

This confirms that the later update (timestamp of 2) is stored in both Clusters – conflict resolved!

Automating the timestamp column

Manually setting the timestamp value is convenient when testing that the mechanism is working as expected could be a nuisance in a production environment (for example, you would need to get the clocks of all application nodes exactly in sync wherever in the world they’re located). This section describes how that can be automated using stored procedures (note that stored procedures don’t work when using the NDB API to make changes but in that situation it should be straight-forward to provide wrapper methods that manage the timestamp field). Note that the timestamp must be an integer field (and needs a high level of precision) and so you can’t use the regular MySQL TIMESTAMP type.

This mechanism assumes that you’ve built “inttime.so” and deployed it to the hosts running the mysqld processes for each cluster (refer to the prerequisite section).

cluster1 mysql> create trigger tab1_insert before insert on tab3 for each row set NEW.ts=inttime;
cluster1 mysql> create trigger tab1_update before update on tab3 for each row set NEW.ts=inttime;
cluster1 mysql> insert into tab1 (NAME,VALUE) values ('James',10),('David',20);
cluster1 mysql> select * from tab1;
+-----------+-------+------------------+
| NAME      | VALUE | ts               |
+-----------+-------+------------------+
| William   |    20 |                0 |
| David     |    20 | 1250090500370307 |
| James     |    10 | 1250090500370024 |
| Frederick |    11 |                2 |
+-----------+-------+------------------+
4 rows in set (0.00 sec)

cluster2 mysql> update tab1 set VALUE=55 where NAME='William';
cluster2 mysql> select * from tab1;
+-----------+-------+------------------+
| NAME      | VALUE | ts               |
+-----------+-------+------------------+
| James     |    10 | 1250090500370024 |
| Frederick |    11 |                2 |
| William   |    55 | 1250090607251846 |
| David     |    20 | 1250090500370307 |
+-----------+-------+------------------+
4 rows in set (0.00 sec)

Setting up Conflict Detection (Same timestamp wins)

With this method, conflicts are detected and recorded but not automatically resolved. The intent is to allow the application to decide how to handle the conflict based on an understanding of what the data means.

Create the database on either cluster (replication will make sure that it appears in both Clusters):

mysql> create database clusterdb;

Before creating the application tables, set  up the ndb_replication system table (again, in either Cluster):

mysql> CREATE TABLE mysql.ndb_replication ( db VARBINARY(63), table_name VARBINARY(63), server_id INT UNSIGNED, binlog_type INT UNSIGNED, conflict_fn VARBINARY(128), PRIMARY KEY USING HASH (db, table_name, server_id) ) ENGINE=NDB PARTITION BY KEY(db,table_name);

mysql> insert into mysql.ndb_replication values ('clusterdb', 'SubStatus', 7, NULL, 'NDB$OLD(ts)');

After that, you can create the application table and its associated exception table:

cluster1 mysql> use clusterdb;
cluster1 mysql> create table SubStatus$EX (server_id INT UNSIGNED,master_server_id INT UNSIGNED,master_epoch BIGINT UNSIGNED,count INT UNSIGNED,sub_id int not null,notes VARCHAR(30) DEFAULT 'To be resolved', PRIMARY KEY (server_id, master_server_id, master_epoch, count)) engine=ndb;
cluster1 mysql> create table SubStatus (sub_id int not null primary key, ActivationStatus varchar(20), ts BIGINT default 0) engine=ndb;

To test that the exception table gets filled in, add some rows to the table and then cause an update conflict (in a similar way to the conflict resolution example but after setting up the timestamp automation):

cluster1 mysql> create trigger SubStatus_insert before insert on SubStatus for each row set NEW.ts=inttime();
cluster1 mysql> insert into SubStatus (sub_id, ActivationStatus) values (1,'Active'),(2,'Deactivated');
cluster1 mysql> select * from SubStatus;
+--------+------------------+------------------+
| sub_id | ActivationStatus | ts               |
+--------+------------------+------------------+
|      1 | Active           | 1250094170589948 |
|      2 | Deactivated      | 1250094170590250 |
+--------+------------------+------------------+
2 rows in set (0.00 sec)

cluster2 myql> use clusterdb;
cluster2 mysql> select * from SubStatus;
+--------+------------------+------------------+
| sub_id | ActivationStatus | ts               |
+--------+------------------+------------------+
|      1 | Active           | 1250094170589948 |
|      2 | Deactivated      | 1250094170590250 |
+--------+------------------+------------------+
2 rows in set (0.00 sec)

cluster1 mysql> slave stop;

cluster2 mysql> slave stop;

...

(at this point, just go on to test as with the conflict resoultion example but in this case expect to see that the confict is not resolved but an entry is added into the conflict table).

Of course, you can always add a trigger on the conflict table and use that to spur the application into initiating its own conflict resolution algorithm.





MySQL Utilities Webinar – Q&A + replay now available

MySQL Utilities logoDr Charles Bell and I recently presented a webinar on MySQL Utilities; there was a heavy focus on what you can acheive with them and how you should use them. In case you couldn’t attend or want to listen to some of the details again, the replay from that webinar is available here.

Abstract:

MySQL Utilities provide a collection of command-line utilities that are used for maintaining and administering MySQL databases, including:

  • Admin Utilities (Clone, Copy, Compare, Diff, Export, Import)
  • Replication Utilities (Setup, Configuration, Automated Slave Promotion)
  • General Utilities (Disk Usage, Redundant Indexes, Search Meta Data)

Andrew Morgan and Chuck Bell will guide you through these utilities, and many more, explaining what you can gain from these tools and how to use them (for example how a single command repeatedly checks your master server and in the event it fails, automatically promotes one of the slaves to be the new master).

Question & Answer Summary:

  • Using these utilities, can we export/import routines’ source as well? Yes.
  • I generally run multi-master behind a proxy (haproxy). I was wondering if the replication check would work. But I guess mysql still does not support multi-master or ring replcation. It depends on want you intend to do exactly. There are some limitation, but we intend to improve the utilities over time. So, just test the utilities on your environment and your are welcome to share your feedback with us.
  • Do you need the python connector if you have the utilities on workbench? No, it is bundled as part of WB.
  • What does “mysqluserclone” do that “SHOW GRANTS FOR USER” doesn’t do? It actually creates the users, clone them and not simply show its privileges like SHOW GRANTS. You can find more details here.
  • With mysqlfailover can I set a sequence to swith specific order if a fail occurs with the master? Yes, you can. It will follow the sequence of the defined candidates.
  • Can I run mysqlfailover on a independent server that check the master and their slave servers or it can run on the master or slave servers (mandatory)? Running on an independent server is ideal. After all, you don’t want it to disappear when the master’s host fails.
  • How many MySQL instances can I manage with mysqlfailover, I mean there is possible to check more than one array with the same independent server? With one mysqlfailover instance you can monitor one topology, i.e. one master and all its slaves. If you have more than one topology (using a different master) you can use a mysqlfailover instance to monitor each one.
  • If a master goes down and replication failover is successful, how hard is it to bring that old master back into the fold, assuming it was just a network disruption? You can use mysqlreplicate to set the old master as a slave of the new master resulted by the failover, that will update the old master. then you can use mysqlrpladmin to switchover the just added slave, to make it a master again, that back your original replication setup.
  • Can mysqlprocgrep log what it kills? There is no option to log what it is done. But you can redirect the utility output if it contains the information you need.
  • How does mysqldbcompare handle auto-increment columns? It will find the difference between auto_increment values. So, currently if it is different it will identify the difference. You will get the difference, as: changed or missing rows are shown in a standard format of GRID, CSV, TAB, or VERTICAL. you can choose in which format you want to get the difference.
  • I use Pacemaker/Corosync to achieve failover. Is mysqlfailover suitable for use in that context? Do I need to add extra logic around it, some wrapper script? If so, does Oracle provide a ready-made scripts for Pacemaker/Corosync? mysqlfailover was designed to work independently, and provides some functionalities that may help with the integration with other systems (like the possibility to execute external pre- pos- failover scripts). We do not have a ready-made script for Pacemaker/Corosync. For more information check here.
  • How long are the frm files kept if the database is corrupt or was deleted. The .frm files are not removed in any way. So there is no deletion or removal of the .frm files. The .frm files are an integral part of the server and stored in the datadir.
  • Can mysqlserverclone clone instances between separate servers? Using SSH? mysqlserverclone, starts a new instance, that mimics a running server or a offline server by taking a look to his configuration, but to copy objects like tables, to another instance you need to use mysqldbcopy. We do not currently support SSH. If you see this as a much needed feature, please open a feature request via bugs.mysql.com. That will help us understand the need and urgency.

 





New white paper – MySQL Replication – Enhancing Scalability and Availability with MySQL 5.5

MySQL Replication from 1 Master to Multiple Slaves

A new white paper has been published that covers MySQL Replication – background information, how it works, how to use it and what’s new in MySQL 5.5. Simply register for the white paper at mysql.com and you’ll be sent your free copy.

The paper starts by covering the fundamental concepts behind replication such as the difference between synchronous and asynchronous replication and the idea behind semisynchronous replication.

It goes on to describe the common use-cases for replication – scaling out, high availability, geographic redundancy and offloading backups or analytics.

Various replication topologies are discussed from simple master-slave to multi-master rings.

As the title of the paper suggests, the paper covers the significant replication changes introduced in MySQL 5.5:

  • Semisynchronous replication: Improved resilience by having master wait for slave to persist events.
  • Slave fsync tuning & Automatic relay log recovery: Option to dictate when relay logs are written to disk rather than relying on default operating system behavior; set sync_relay_log=1 to ensure that no more than 1 statement or transaction is missing from the relay log after a crash. The slave can now recover from corrupted relay logs by requesting corrupt entries to be resent from the master. Three new options are introduced (sync-master-info, sync-relay-log and sync-relay-log-info)
  • Replication Heartbeat: Automatically checks the status of the connection between the master and the slave(s), allowing a more precise failure detection mechanism. Can detect loss of connection within milliseconds (configurable). Avoid unnecessary relay log rotation when the master is idle.
  • Per server replication filtering: When a server is removed from a replication ring, a surviving server can be selected to remove its outstanding replication messages once they’ve been applied by all servers.
  • Precise Slave Type Conversions: Allows different types to be used on the master and slave, with automatic type promotion and demotion when using row-based replication (already possible with statement-based replication)
  • Individual Log Flushing: Selectively flush server logs when using ‘FLUSH LOGS’ for greater control
  • Safe logging of mixed transactions: Replicate transactions containing both InnoDB and MyISAM changes

To be better able to use replication, it helps to understand the basics about how MySQL Replication has been implemented – for example the roles of the binary and relay logs and so this is covered.

Perhaps the most useful sections are those that actually walk through using MySQL Replication – specifically these activities:

  • Configuring, running and testing replication
  • Migrating from traditional asynchronous replication to semisynchronous replication
  • Administering & monitoring replication (including MySQL Enterprise Monitor)
  • Failover and recovery

Replication is a little different when using MySQL Cluster; most significantly because you can have multiple MySQL Servers and other application nodes modifying the same database but also because the domains where MySQL Cluster has been used have required extreme High Availability and so there are Cluster-unique features such as active-active replication with conflict detection and resolution. This paper isn’t focussed on MySQL Cluster but a summary of these differences is included.