Tag Archive for clustering

Choosing the right MySQL High Availability Solution – webinar replay

MySQL-HA-Logo1.pngEarlier this week, I presented a webinar on MySQL High Availability options for MySQL – what they are and how to choose the most appropriate one for your application.

The replay of this webinar can now be viewed here or if you just want to look at the charts then scroll down. At the end of this post, I include a summary of the Q&A from the webinar.

How important is your data? Can you afford to lose it? What about just some of it? What would be the impact if you couldn’t access it for a minute, an hour, a day or a week?

Different applications can have very different requirements for High Availability. Some need 100% data reliability with 24x7x365 read & write access while many others are better served by a simpler approach with more modest HA ambitions.

MySQL has an array of High Availability solutions ranging from simple backups, through replication and shared storage clustering – all the way up to 99.999% available shared nothing, geographically replicated clusters. These solutions also have different ‘bonus’ features such as full InnoDB compatibility, in-memory real-time performance, linear scalability and SQL & NoSQL APIs.

The purpose of this webinar is to help you decide where your application sits in terms of HA requirements and discover which of the MySQL solutions best fit the bill. It will also cover what you need outside of the database to ensure High Availability – state of the art monitoring being a prime example.

The charts

 

Questions and Answers

  • What does “HA” stands for? High Availability
  • What is meant by scale-out? Scale-out is where you increase the capacity or the throughput of the system by adding extra (usually commodity) machines – this is the opposite of scale-up where you buy the biggest single server that you can find. It tends to be much more economical this way and you can add extra capacity if and when you need it.
  • Most of my applications are for small businesses were the data load is not that big, is this session worth listening too? Yes – the focus of this presentation is on keeping your data safe and accessible, not on scaling to massive volumes and throughput. Scale-out is touched on as if you need that as well then it can influence your choice of High Availability solution
  • How does all this compare with Amazon Aurora? Aurora is something that’s offered by Amazon and is supported by them. Aurora can only be run on AWS – the solutions covered here can be taken to lots of different environments – whether cloud or ‘bare metal’
  • what amount of work would be involved in migrating actual InnoDB based DB’s into NDB? As you’d expect, it depends on the application. Changes should be minor to get it to run but you might need to make more adjustments to get the best performance. This white paper is a good place to start.
  • Is it possible to migrate InnoDB data to MySQL Cluster? Yes. The first thing to note is that to use MySQL Cluster, you have to use the mysqld (MySQL Server) process that comes with the MySQL Cluster package; once you’ve switched to that binary then you can run ALTER TABLE my_tab ENGINE=NDB; provided that your schema is compatible with MySQL Cluster. Alternatively you can perform a mysqldump and then load the data in.
  • Does MySQL Fabric solutions support “rolling upgrade”? Yes – you can perform a rolling upgrade of the managed MySQL Servers but you have to do it yourself, Fabric doesn’t automate it at present
  • What about master-master replication? You can use active-active replication but the application is responsibe for avoiding conflicts between the two sites unless you use MySQL Cluster/NDB (where the functionality is built into the database)
  • What’s the maximum distance that the master and slave can be apart? For MySQL Replication, there is no maximum distance; the latency of the master isn’t impacted if you stick with the default asynchronous replication. If you use semi-synchronous replication then the latency of your transactions will be impacted by the WAN latency between the master and slave site(s).
  • Is there a monetary investment to implement this or is it free? The database technology presented in this session can be used under the GPL open source license; if you buy a commercal edition then you get access to some additional tools such as MySQL Enterprise Monitor and MySQL Cluster Manager.
  • Can we check the transaction sequence in slave side? Yes, with MySQL replication, you can check which transactions have been applied on the slave.
  • Can the slave switch over to master, in case the master needs to be shut down? Yes – that’s a very common use case
  • Do we have seperate binlog for different databases (schemas)? All of the databases (schemas) within the MySQL Server use the same binary log. When that log file fills up, it’s rotated out and a new one used.
  • How can I implement “auto-failover” in mysql 5.6 replication? Are you talking abaout “mysqlfailover” script? You can use the mysqlfailover utility or MySQL Fabric
  • What are the similarties and differences between InnoDB and the NDB/MySQL Cluster engines? We’ll cover some of this in this presentation and you can find more details in this white paper
  • With master-master replication, there were some parameters to control the autoincrement IDs so that we could avoid conflicts with active-active MySQL replication? If you ensure that the same row isn’t written to on both masters then you can avoid conflicts. One option could be to store odd primary keys on one and even on the other. You can then set the auto_increment_increment and auto_increment_offset parameters and then use auto-increment primary keys on each MySQL Server
  • Is there an online backup tool available in mysql – without locking my database? Yes – if you’re using InnoDB then you can use MySQL Enterprise Backup (part of MySQL Enterprise Edition and it’s also much faster than mysqldump); MySQL Cluster it has a built-in online backup tool (which can be simpler to use in conjunction with MySQL Cluter Manager which is part of MySQL Cluster Carrier Grade Edition)
  • In multi-source replication, how are data conflicts handled? If more than one master modifies the same row, which one is applied? That’s left as an exercise for the application. The application is responsible for making sure that there are no conflicting updates (if you care about the conflicts). You could also use MySQL Fabric to shard the data.
  • With MySQL Group Replication, if one master is down, will transactions still be applied? With MySQL Group Replication, updates can be sent to any of the servers and so if one is unavailable just switch to any of the others
  • what is the maximum data size MySQL can handle in it’s latest release? The maximum size of an InnoDB table[space] is four billion pages (64TB with the default 16k page size). Beyond that you start partitioning or use MySQL Fabric
  • Can you give a short definition for sharding? Sharding is where you take the data for a table and split it accross multiple MySQL Server instances. Typically you’ll choose one or more columns from the table to act as the sharding key to decide which shard a specific row should be stored in. You can also have functional sharding where you decide that the whole of table A will be in one shard and all of table B in another
  • Is there a storage engine to handle JSON documents? Here’s an interesting blog on that topic
  • What algorithms are available for sharding in MySQL? With MySQL Fabric you can use a hash or define ranges for the shard key. With MySQL Cluster, the sharding is completely transparrent to the application but it uses a MD5 hashing under the covers
  • In MySQL cluster is all the data replicated to all of the data nodes? Data is synchrously replicated between the 2 data nodes forming a node group. Different node groups are responsible for different shards for any given table
  • What’s the maximum number of data nodes in MySQL Cluster? 48
  • Which it best inter-connect method between MySQL Cluster data nodes for highest performance? Infiniband and GB Ethernet have both shown great results. If you’re able to configure the behaviour then low latency will help get the best performance
  • Can I colocate MySQL Cluster data nodes with MySQL Servers in order use all posible resources? Yes, you can co-locate data nodes and MySQL Servers. Note that the management node (ndb_mgmd) should not run on the same machine as any of the data nodes – read this post for details on how to deploy MySQL Cluster nodes for High Availability