Archive for MySQL

Dramatically Increased MySQL Cluster JOIN performance with Adaptive Query Localization

Regular readers of this Blog or attendees at the 2010 O’Reilly MySQL Conference may recall a feature that the MySQL Cluster team were working to speed up JOINs (see Jonas’s slides from that conference here). The good news is that work has continued since then and it is now part of the new MySQL Cluster 7.2 Milestone Development Release. This post will step through where to get hold of the binaries and see the benefits for yourself. The examples I try here result in a 25x speedup just by turning the feature on – I’ve also seen a 50x speedup on other queries!

We’re anxious to get feedback on what benefits you see with your application’s JOINs, please respond to this post.

What’s in a name?

If some of this seems familiar but you don’t remember hearing the term “Adaptive Query Localization” before then you’re not going crazy – previous internal names were SPJ (Select Project Join) and Pushed-Down Joins. We just figured that Adaptive Query Localization was more descriptive.

Classic Nested-Loop-Join

What does it do?

Traditionally, joins have been implemented in the MySQL Server where the query was executed. This is implemented as a nested-loop join; for every row from the first part of the join, a request has to be sent to the data nodes in order to fetch the data for the next level of the join and for every row in that level…. This method can result in a lot of network messages which slows down the query (as well as wasting resources).

When turned on, Adaptive Query Localization results in the hard work being pushed down to the data nodes where the data is locally accessible. As a bonus, the work is divided amongst the pool of data nodes and so you get parallel execution.

NDB API

How is it implemented?

I’ll leave the real deep and dirty details to others but cover the basic concepts here. All API nodes access the data nodes using the native C++ NDB API, the MySQL Server is one example of an API node (the new Memcached Cluster API is another). This API has been expanded to allowed parameterised or linked queries where the input from one query is dependent on the previous one.

To borrow an example from an excellent post by Frazer Clement on the topic, the classic way to implement a join would be…

SQL > select t1.b, t2.c from t1,t2 where t1.pk=22 and t1.b=t2.pk;
  ndbapi > read column b from t1 where pk = 22;
              [round trip]
           (b = 15)
  ndbapi > read column c from t2 where pk = 15;
              [round trip]
           (c = 30)
           [ return b = 15, c = 30 ]

Using the new functionality this can be performed with a single network round trip where the second read operation is dependent on the results of the first…

  ndbapi > read column @b:=b from t1 where pk = 22;
           read column c from t2 where pk=@b;
              [round trip]
           (b = 15, c = 30)
           [ return b = 15, c = 30 ]

Effects of Adaptive Query Localization

Note that if your application is using the NDB API directly then you can use this same linked query functionality to speed up your queries.

Where do I get it?

Download the source or binaries from http://dev.mysql.com/downloads/cluster/ (select the sub-tab for the Development Milestone Release).

How do I use it?

The first step is to get you Cluster up and running. As the focus of this feature is to reduce the amount of network messaging, it makes sense to perform your tests on multiple machines; if you need pointers on setting Cluster up then check this post on running Cluster over multiple machines (or the Windows version).

System Configuration

For my test, I used 2 PCs, each running Fedora 14 with Quad-Core CPU, 8 GBytes of RAM and Gigabit Ethernet connectivity. Each PC was running 2 data nodes (ndbd rather than ndbmtd) and one of the PCs was also running the management node and the MySQL Server running the queries (note that this configuration is not suitable for a HA deployment – for that run the management node on a third machine and run a MySQL Server on 192.168.1.11).

I’d hoped to go a step further and have the MySQL Server run on a third machine but hardware problems put paid to that – the reason that this would have been interesting is that it would have meant more of the messaging would be over the network and so would give a more realistic performance comparison (the speedup factor should have been higher). Another couple of steps that could further improve the speedup:

  1. Use multi-threaded data nodes (as more of the work is being done in the data nodes, this should help)
  2. Use machines with more cores
  3. Tune the configuration parameters (I’m keeping it as simple as possible here)

For reference, here is the configuration file used (config.ini):

 [ndb_mgmd]
 hostname=192.168.1.7
 datadir=/home/billy/mysql/spj/my_cluster/ndb_data
 NodeId=1

 [ndbd default]
 noofreplicas=2
 DiskPageBufferMemory=4M

 [ndbd]
 hostname=192.168.1.7
 datadir=/home/billy/mysql/spj/my_cluster/ndb_data
 NodeId=3

 [ndbd]
 hostname=192.168.1.11
 datadir=/home/billy/mysql/spj/my_cluster/ndb_data
 NodeId=4

 [ndbd]
 hostname=192.168.1.7
 datadir=/home/billy/mysql/spj/my_cluster/ndb_data2
 NodeId=5

 [ndbd]
 hostname=192.168.1.11
 datadir=/home/billy/mysql/spj/my_cluster/ndb_data2
 NodeId=6

 [mysqld]
 NodeId=50

And for the MySQL Server (my.cnf):

[mysqld]
ndbcluster
datadir=/home/billy/mysql/spj/my_cluster/mysqld_data

As a reminder – here is how you start up such a Cluster:

[billy@ws2 my_cluster]$ ndb_mgmd -f conf/config.ini --initial 
  --configdir=/home/billy/mysql/spj/my_cluster/conf/
[billy@ws2 my_cluster]$ ndbd -c 192.168.1.7:1186
[billy@ws1 my_cluster]$ ndbd -c 192.168.1.7:1186
[billy@ws2 my_cluster]$ ndbd -c 192.168.1.7:1186
[billy@ws1 my_cluster]$ ndbd -c 192.168.1.7:1186
[billy@ws2 my_cluster]$ ndb_mgm -e show # Wait until data nodes are running [billy@ws2 my_cluster]$ mysqld --defaults-file=conf/my.cnf& [billy@ws2 my_cluster]$ mysql -h 127.0.0.1 -P 3306 -u root

Three tables that are to be used for the queries and these are created as follows:

mysql> CREATE DATABASE clusterdb; USE clusterdb;
mysql> CREATE TABLE residents (id INT NOT NULL PRIMARY KEY, name VARCHAR(20),
  postcode VARCHAR(20)) ENGINE=ndb;
mysql> CREATE TABLE postcodes (postcode VARCHAR(20) NOT NULL PRIMARY KEY, 
  town VARCHAR(20)) ENGINE=ndb;
mysql> CREATE TABLE towns (town VARCHAR(20) NOT NULL PRIMARY KEY,
  county VARCHAR(20)) ENGINE=ndb;

I then added 100K rows to each of these tables; if you want to recreate this then you can download the data files here.

mysql> LOAD DATA LOCAL INFILE  "/home/billy/Dropbox/LINUX/projects/SPJ/residents.csv"
   REPLACE INTO TABLE residents FIELDS TERMINATED BY ',' ENCLOSED BY '"';
mysql> LOAD DATA LOCAL INFILE  "/home/billy/Dropbox/LINUX/projects/SPJ/postcodes.csv"
  REPLACE INTO TABLE postcodes FIELDS TERMINATED BY ',' ENCLOSED BY '"';
mysql> LOAD DATA LOCAL INFILE  "/home/billy/Dropbox/LINUX/projects/SPJ/towns.csv"
  REPLACE INTO TABLE towns FIELDS TERMINATED BY ',' ENCLOSED BY '"'; 

Now everything is set up to actually perform our tests! First of all two queries are run with the adaptive query localization turned off i.e. this is the “before” picture:

mysql> set ndb_join_pushdown=off;
mysql> SELECT COUNT(*)  FROM residents,postcodes WHERE
  residents.postcode=postcodes.postcode AND postcodes.town="MAIDENHEAD";
 +----------+
 | COUNT(*) |
 +----------+
 |    20000 |
 +----------+
 1 row in set (27.65 sec)
mysql> SELECT COUNT(*)  FROM residents,postcodes,towns WHERE
  residents.postcode=postcodes.postcode AND
  postcodes.town=towns.town AND towns.county="Berkshire";
 +----------+
 | COUNT(*) |
 +----------+
 |    40001 |
 +----------+
 1 row in set (48.68 sec)

and then the test is repeated with adaptive query localization turned on:

mysql> set ndb_join_pushdown=on;
mysql> SELECT COUNT(*)  FROM residents,postcodes WHERE
  residents.postcode=postcodes.postcode AND postcodes.town="MAIDENHEAD";
 +----------+
 | COUNT(*) |
 +----------+
 |    20000 |
 +----------+
 1 row in set (1.07 sec)
mysql> SELECT COUNT(*)  FROM residents,postcodes,towns WHERE
  residents.postcode=postcodes.postcode AND postcodes.town=towns.town
  AND towns.county="Berkshire";
 +----------+
 | COUNT(*) |
 +----------+
 |    40001 |
 +----------+
 1 row in set (2.02 sec)

For those 2 queries it represents a 25.8x and 24.1x speedup.

It’s important to note that not every join can currently be pushed down to the data nodes; here are the current rules (we hope to relax them overtime) for a query to be suitable:

  • JOINed columns must have the same data type
  • Queries should not reference BLOBs
  • Explicit locking is not supported
  • Only supports fully or partially qualified primary keys or plain indexes as access method for child tables (first part of JOIN can be a full table scan)

You can check whether your query is fitting these rules using EXPLAIN, for example:

mysql> set ndb_join_pushdown=on;
mysql> EXPLAIN SELECT COUNT(*)  FROM residents,postcodes WHERE residents.postcode=postcodes.postcode AND postcodes.town="MAIDENHEAD";
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+--------------------------------------------------------------------------+
| id | select_type | table     | type   | possible_keys | key     | key_len | ref                          | rows   | Extra                                                                    |
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+--------------------------------------------------------------------------+
|  1 | SIMPLE      | residents | ALL    | NULL          | NULL    | NULL    | NULL                         | 100000 | Parent of 2 pushed join@1                                                |
|  1 | SIMPLE      | postcodes | eq_ref | PRIMARY       | PRIMARY | 22      | clusterdb.residents.postcode |      1 | Child of 'residents' in pushed join@1; Using where with pushed condition |
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+--------------------------------------------------------------------------+
mysql> EXPLAIN EXTENDED SELECT COUNT(*)  FROM residents,postcodes,towns WHERE residents.postcode=postcodes.postcode AND postcodes.town=towns.town AND towns.county="Berkshire";
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+----------+------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table     | type   | possible_keys | key     | key_len | ref                          | rows   | filtered | Extra                                                                                                                  |
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+----------+------------------------------------------------------------------------------------------------------------------------+
|  1 | SIMPLE      | residents | ALL    | NULL          | NULL    | NULL    | NULL                         | 100000 |   100.00 | Parent of 3 pushed join@1                                                                                              |
|  1 | SIMPLE      | postcodes | eq_ref | PRIMARY       | PRIMARY | 22      | clusterdb.residents.postcode |      1 |   100.00 | Child of 'residents' in pushed join@1                                                                                  |
|  1 | SIMPLE      | towns     | eq_ref | PRIMARY       | PRIMARY | 22      | clusterdb.postcodes.town     |      1 |   100.00 | Child of 'postcodes' in pushed join@1; Using where with pushed condition: (`clusterdb`.`towns`.`county` = 'Berkshire') |
+----+-------------+-----------+--------+---------------+---------+---------+------------------------------+--------+----------+------------------------------------------------------------------------------------------------------------------------+

Note that if you want to check for more details why your join isn’t currently being pushed down to the data node then you can use “EXPLAIN EXTENDED” and then “SHOW WARNINGS” to get more hints. Hopefully that will allow you to tweak your queries to get the best improvements.

PLEASE let us know your experiences and give us examples of queries that worked well and (just as importantly) those that didn’t so that we can improve the feature – just leave a comment on this Blog with your table schemas, your query and your before/after timings.





Are you using NoSQL databases?

We’re interested in finding out what NoSQL databases you might be using (and we include MySQL Cluster in that list when using one of its NoSQL interfaces such as the NDB API or ClusterJ).

To figure this out we’ve posted a quick poll on the home page of dev.mysql.com (go straight to the bottom-right corner of the page) – please take 30 seconds to complete it (it shouldn’t take any longer than that) if you can. In return, you can also see the latest results from the poll.





Submit your MySQL Proposals for Oracle OpenWorld 2011

Once again, MySQL gets special treatment and has a dedicated track at this year’s Oracle OpenWorld (October 2-6 in San Francisco). If you think that you’d have something interesting to present (for example if you’re using MySQL in an interesting way) then why not submit a session – the call for papers is now open! Time’s a little tight to get your proposals in – the deadline is 27th March.

Alternatively, if you want to attend but not present then there’s super-save discount until 1st April – register now!

 





Free webinar – Scaling web apps with MySQL (an alternative to the MEMORY storage engine)

Mat Keep and I will be presenting this free webinar on Wednesday 14 July.

The MEMORY storage engine has been widely adopted by MySQL users to provide near-instant responsiveness with use cases such as caching and web session management. As these services evolve to support more users, so the scalability and availability demands can start to exceed the capabilities of the MEMORY storage engine.

The MySQL Cluster database, which itself can be implemented as a MySQL storage engine, is a viable alternative to address these evolving web service demands. MySQL Cluster can be configured and run in the same way as the MEMORY storage engine (ie on a single host with no replication and no persistence). As web services evolve, any of these attributes can then be added in any combination to deliver higher levels of scalability, availability and database functionality, especially for those workloads which predominately access data by the primary key.

As always, the webinar is free of charge but you will need to register here.

Time:

  • Wed, Jul 14: 06:00 Hawaii time
  • Wed, Jul 14:  09:00 Pacific time (America)
  • Wed, Jul 14: 10:00 Mountain time (America)
  • Wed, Jul 14: 11:00 Central time (America)
  • Wed, Jul 14: 12:00 Eastern time (America)
  • Wed, Jul 14: 16:00 UTC
  • Wed, Jul 14: 17:00 Western European time
  • Wed, Jul 14: 18:00 Central European time
  • Wed, Jul 14: 19:00 Eastern European time

If you can’t make the live webinar then register anyway and you’ll get sent a link to the recording after the event.





MySQL Workbench 5.2 goes GA – partial support for MySQL Cluster

Configure MySQL Server nodes for MySQL Cluster

The new version of MySQL Workbench (5.2.25) has just gone GA – see the Workbench BLOG for details.

So what’s the relevance to MySQL Cluster? If you have a Cluster that uses MySQL Servers to provide SQL access then you can now use MySQL Workbench to manage those nodes:

  • Start & stop the mysqld processes
  • Configure the per-mysqld configuration data held in my.cnf or my.ini

The reason that I describe the support as ‘partial’ is that these MySQL Servers are treated as independent entities (no concept of them being part of a Cluster) and there is currently no way to use it to configure or manage the other Cluster processes (data and management nodes). Having said that, what is there provides a lot of value and Workbench is designed to be very extensible  and so hopefully there can be further MySQL Cluster support in the future.

View MySQL Cluster status variables

In addition to MySQL Cluster-specific configuration parameters, you can also access the Cluster-specific status variables (these are the ones starting with ndb).

While I’ve focussed on what’s unique to MySQL Cluster, you can of course use the other Workbench features with MySQL Cluster – for example:

  • Creating (or reverse-engineering) your data model
  • Define your schema
  • View/write data to your tables
  • Create your SQL queries




Using Syslog with MySQL Cluster

By default, MySQL Cluster sends log data to a file but you can also send it to the console or to Syslog; this article explains how to send it to Syslog. The example given here is for LINUX.

In this example, I’ll use the “user” syslog facility name and so the first step is to make sure that syslog is configured to route those messages. If this hasn’t already been configured then add the following lines to /etc/rsyslog.conf:

# Log user messages to local files
user.*    /var/log/user

For the changes to take effect, restart the syslog service:

[root@ws1 etc]# service rsyslog restart
Shutting down system logger:                               [  OK  ]
Starting system logger:                                    [  OK  ]

Note that you should make those changes as root.

Still as root, start up a stream of  any additions to the new log file:

[root@ws1 etc]# tail -f /var/log/user

To tell Cluster to use Syslog, add this line into the [ndb_mgmd] section in config.ini:

LogDestination=SYSLOG:facility=user

and then start up your Cluster as normal.

You should now be able to see that MySQL Cluster information is being logged to /var/log/user.

You can adjust how much information is logged either through the config file or from the ndb_mgm tool, for example – to see when global checkpoints are written:

ndb_mgm> all clusterlog checkpoint=15
Executing CLUSTERLOG CHECKPOINT=15 on node 3 OK!
Executing CLUSTERLOG CHECKPOINT=15 on node 4 OK!
Note that a log-level of 15 will show all logs and 0 will show none. Other log categories besides CHECKPOINT are STARTUP, SHUTDOWN, STATISTICS, NODERESTART, CONNECTION, INFO, ERROR, CONGESTION, DEBUG and BACKUP.




MySQL Cluster presentation at Oracle Open World 2010

As part of “MySQL Sunday” at this year’s Oracle Open World, Mat Keep and I will be presenting on the latest MySQL Cluster features. We’ll be presenting at 15:30 (Pacific Time) on 19th September (the event starts with a key note at 12:30).

If you’re attending Oracle Open World then please indicate that you’d like to attend the MySQL Sunday when you register. If you aren’t planning to go to Oracle Open World but will be in the San Francisco area then buying a Discover pass (only $50 if you register by 16 July) will get you into the MySQL Sunday sessions. Register here.

For details on the presentations and speakers, check here.





Download, install, configure, run and test MySQL Cluster in under 15 minutes

Single host Cluster

A series of quick-start guides are now available to get you up and running with MySQL Cluster in as little time as possible; they are available for LINUX/Mac OS X, Windows and Solaris. The configuration is intentionally a simple one – 2 data nodes, 1 management node and 1 MySQL Server. Once you have this up and running, your next experiment may be to extend this over multiple hosts.

Download the Quick Start Guide for your platform below:

These links are also available from the MySQL Cluster download page.

The intent is that these guides should be simple to follow even if you have never tried MySQL Cluster (or even MySQL) before. Any comments or suggested improvements would be appreciated.





MySQL Cluster 7.1.4b binaries released

The binary version for MySQL Cluster 7.1.4b has now been made available at http://www.mysql.com/downloads/cluster/

A description of all of the changes (fixes) that have gone into MySQL Cluster 7.1.4b (compared to 7.1.3) can be found in the MySQL Cluster 7.1.4b Change Log.





Breakfast seminar on what’s new with MySQL – London

If you’re in London on Thursday 24th June then there’s a great chance to find out what’s new in MySQL.

Join us for an Oracle MySQL Breakfast Seminar to better understand Oracle’s MySQL strategy and what’s new with MySQL!
Agenda:
09:00 a.m.    Welcome Coffee/Tea
09:30 a.m.    Oracle’s MySQL Strategy
10:00 a.m.    What’s New – The MySQL Server & MySQL Cluster
10.45 a.m.    Coffee/Tea Break
11:00 a.m.    What’s New – MySQL Enterprise & MySQL Workbench
11:45 a.m.    Q&A
12:00 noon    End of the Breakfast Seminar

Cost?
None, it’s a free event! But places are limited and the seminar is held on a first come first served basis, so register quickly!

Location:

Sun Microsystem’s Customer Briefing Center
Regis House
45 King William Street
London EC4R 9AN
Tel: (020) 7628 3000

Image courtesy of Anirudh Koul.

Join us for an Oracle MySQL Breakfast Seminar in London, Thursday June 24th 2010, to better understand Oracle’s MySQL strategy and what’s new with MySQL!

Agenda:
09:00 a.m. Welcome Coffee/Tea
09:30 a.m. Oracle’s MySQL Strategy
10:00 a.m. What’s New – The MySQL Server & MySQL Cluster
10.45 a.m. Coffee/Tea Break
11:00 a.m. What’s New – MySQL Enterprise & MyQL Workbench
11:45 a.m. Q&A
12:00 noon End of the Breakfast Seminar

* Agenda subject to change

Cost?
None, it’s a free event! But places are limited and the seminar is held on a first come first served basis, so register quickly!