MySQL face-off: Amazon Aurora outscales Google Cloud SQL

Google Cloud SQL performance may beat Amazon Aurora at low thread counts, but Aurora owns the high end

At a Glance

Many web applications have been built on an open source stack that included MySQL. Despite its limitations, MySQL managed to become the world’s most widely used open source RDBMS. What limitations, you ask? Out of the box, MySQL does not scale all that well and, in particular, cannot handle a lot of simultaneous clients compared to commercial databases.

Amazon Aurora and Google Cloud SQL were both developed to offer customers high-performance, high-scalability MySQL databases as a service. Each works best as part of an application stack residing not only in the same cloud provider, but also in the same availability zone, to minimize the latency between services and maximize the network throughput within the stack.

I benchmarked and reviewed Amazon Aurora about a year ago. More recently I previewed Google Cloud SQL. In this article, I’ll tell you what happened when I benchmarked Google Cloud SQL against Amazon Aurora using a transactional load with a varying number of client threads.

Benchmarking SQL in the cloud 

Benchmarks are really hard to perform correctly. They are easy to do wrong, leading to the expression “lies, damned lies, and benchmarks.” And they can be done in ways that are meaningless but sound impressive, leading to the portmanteau “benchmarketing.”

For my review of Amazon Aurora, I more or less reproduced Amazon’s own benchmarks, with some difficulty. I initially recorded numbers half of what Amazon reported for its read-only and write-only Sysbench tests. After working with one of the Amazon engineers to diagnose the differences between my configuration and theirs, I changed the availability zone of the clients to match the availability zone of the database and actually recorded higher write numbers than Amazon did. As it turned out, Aurora write rates tend to start high, then level off, and I was running a shorter test than Amazon had. There were several other configuration subtleties that I worried about, but actually got right, principally an enhanced networking driver and modifications to the Linux network routing settings.

In my first look at Google Cloud SQL, I felt I didn’t have the time and resources to reproduce Google’s transactional Sysbench tests of its own Cloud SQL database, Amazon RDS for MySQL, and Amazon RDS for Aurora, so I printed Google’s results with the proviso that they had not yet been replicated by InfoWorld. I also qualified any conclusions I drew about comparative performance by saying “according to Google’s results.”

I have now tried to replicate Google’s results. I have also tried a different configuration that I felt would be more representative of the way people might try to use high-performance databases such as Google Cloud SQL and Amazon Aurora. I went into the benchmark process thinking it would be a week’s work; it actually took two weeks, and I needed and got help from both Google and Amazon engineers to make sure I was testing each of their databases properly.

Along the way, I learned a great deal about how to manage both database services. Compared to wrestling with a large database on a physical local server, both Google Cloud SQL and Amazon Aurora are dreams come true, at least once the data has been loaded.

As you’ll see, some of the conclusions have changed since my first look, and some have not.

sysbench baseline tps per thread fig1

Figure 1. Disk-bound transaction rates per thread by concurrent Sysbench threads for two runs against the largest available Google Cloud SQL Second Generation instance type (db-n1-highmem-16) and the two largest Amazon Aurora instance types, 4xlarge (comparable to Google’s db-n1-highmem-16) and 8xlarge (twice as powerful and expensive). Higher transaction rates are better.

The benchmarks that Google and I ran measure online transaction processing (OLTP) using Sysbench, a favorite benchmarking tool of the MySQL community, and varying the number of client threads as a proxy for varying the number of actual clients. In both cases we created a single failover replica. Note that when you use the Amazon Aurora cluster endpoint, the primary database takes the writer role and the replica takes a reader role.

It took me several tries and help from Google and Amazon engineers to get the full benchmarks to run. (Note that my first run, coded in blue, ends with 256 threads.) Loading the database took about five hours; running the OLTP tests took a half-hour per thread count.

As you can see by viewing the figure above with the first figure in Google’s blog post comparing Google Cloud SQL to Amazon RDS for MySQL and Amazon RDS for Aurora, there now might be a different conclusion to draw about the relative performance of Google Cloud SQL and Amazon Aurora.

In Google’s figure, Google Cloud SQL Second Generation outperforms Amazon RDS for Aurora as well as Amazon RDS for MySQL for disk-bound transactions per thread below 16 threads. In one of my runs (coded in blue), that is true; in the other (coded in yellow), Amazon Aurora 4xlarge (comparable to Google Cloud SQL’s largest instance) and 8xlarge (twice the power) both outperform Google Cloud SQL for all thread counts.

I know what happened to crash the first run at 512 threads, thanks to help from a Google engineer: the client VM ran out of file handles. I fixed that before the second run, but apparently something else was going on to increase the latency and decrease the transaction rate early in the second run. I suspect the read replica was still catching up to the primary database, based on my snapshots of the console taken during the run, but that isn’t a firm conclusion.

I didn’t benchmark Amazon RDS for MySQL -- I thought it irrelevant. Google engineers didn’t benchmark an Amazon Aurora 8xlarge instance because Google doesn’t have a comparable instance size.

Variations on a theme

Using the same benchmark script as the Google engineers, my measurements of Amazon Aurora 4xlarge were higher than theirs for all thread counts. This makes me wonder whether the Google engineers missed one or more of the recommended settings for benchmarking Aurora, perhaps the Amazon enhanced networking driver, which was installed by default in the Amazon (Red Hat) Linux client AMI I ran, but not installed by default in the Ubuntu Trusty client AMI that I suspect Google ran.

My evidence for this suspicion is that Google’s published script is for Ubuntu Trusty and uses the MySQL client version shipped with late versions of Trusty. I had to modify the setup part of the script first to run on Ubuntu Xenial in the Google cloud and again to run on Red Hat in the Amazon cloud.

The need for using the enhanced networking driver, and the other recommended conditions for benchmarking Amazon Aurora, are discussed in Amazon’s benchmarking white paper.

For completeness, let’s also look at the actual transaction rates:

sysbench baseline tps fig2

Figure 2. Disk-bound transaction rates by concurrent Sysbench threads for two runs against the largest available Google Cloud SQL Second Generation instance type (db-n1-highmem-16) and the two largest Amazon Aurora instance types, 4xlarge (comparable to Google’s db-n1-highmem-16) and 8xlarge (twice as powerful and expensive). Higher transaction rates are better.

and the measured latencies:

sysbench baseline latency fig3

Figure 3. Disk-bound latencies by concurrent Sysbench threads for two runs against the largest available Google Cloud SQL Second Generation instance type (db-n1-highmem-16) and the two largest Amazon Aurora instance types, 4xlarge (comparable to Google’s db-n1-highmem-16) and 8xlarge (twice as powerful and expensive). Lower latency is better.

The transaction rates tell the same story as the transaction rates per thread, as they should, since they are based on the same raw numbers. The latencies are consistent with what Google measured, although the charts look different because of the huge latency variation in Amazon RDS for MySQL shown in Google’s chart.

Note that at the highest client thread count, the Amazon Aurora 8xlarge database still had at last 20 percent spare CPU capacity: This particular test was network-limited at the client, which is consistent with the need to use multiple client VMs with their own network connections to saturate Aurora, as Amazon has always mentioned in its benchmarking advice.

In my first look at Google Cloud SQL, I said (quoting Google’s advice as well as drawing from my own experience) that if you want your database to be fast, you will want the maximum table size to fit into memory. In the tests above, the tables had 20 million rows, making them (deliberately) five times larger than the memory capacity of the Google Cloud SQL and Amazon Aurora 4xlarge databases, and two times larger than the memory capacity of the Amazon Aurora 8xlarge database.

Following the “fit the largest table into memory” principle, I reran the benchmarks using tables with 4 million rows, which fit into memory for all of the databases I tested.

Let’s look at the transaction rates, along with the transaction rates per thread:

sysbench small tables tps fig4

Figure 4. Small-table (tables fit in memory) transaction rates by concurrent Sysbench threads for the largest available Google Cloud SQL Second Generation instance type (db-n1-highmem-16) and the two largest Amazon Aurora instance types, 4xlarge (comparable to Google’s db-n1-highmem-16) and 8xlarge (twice as powerful and expensive). Higher transaction rates are better. At the highest thread count for Amazon Aurora 8xlarge, the performance was network-limited at the client VM.

sysbench small tables tps per thread fig5

Figure 5. Small-table (tables fit in memory) transaction rates per thread by concurrent Sysbench threads for the largest available Google Cloud SQL Second Generation instance type (db-n1-highmem-16) and the two largest Amazon Aurora instance types, 4xlarge (comparable to Google’s db-n1-highmem-16) and 8xlarge (twice as powerful and expensive). Higher transaction rates are better. At the highest thread count for Amazon Aurora 8xlarge, the performance was network-limited at the client VM.

and the measured latencies:

sysbench small tables latency fig6

Figure 6. Small-table (tables fit in memory) latencies by concurrent Sysbench threads for the largest available Google Cloud SQL Second Generation instance type (db-n1-highmem-16) and the two largest Amazon Aurora instance types, 4xlarge (comparable to Google’s db-n1-highmem-16) and 8xlarge (twice as powerful and expensive). Lower latency is better.

We have no outside measurements against which to compare these, so let’s compare them to the disk-bound measurements. Basically, they show somewhat higher transaction rates and lower latencies across the board, reflecting the fact that most of the read operations were cached in memory, although writes had to be committed to the solid-state disks.

If you are planning to make your maximum table size fit into memory, these are the kinds of numbers you should expect to see. In particular, if you need the database latency to be less than 500ms (to pick a common but loose constraint), then any of these databases will do the trick for you at up to 256 threads. Amazon Aurora will serve for 512 and higher thread counts, but Google Cloud SQL’s latency might not be up to snuff. If your constraint on latency is different, draw your own line on the chart and come to your own conclusions. You can refer to the TPS chart (figure 4) to find the expected transaction rate for the thread count that meets the latency constraint you choose.

Drawing conclusions

As we’ve seen, my attempts at benchmarking Google Cloud SQL and Amazon Aurora don’t completely support Google’s contention of superiority at low numbers of threads -- the claim only seems true for disk-bound databases and perhaps not always. However, that doesn’t mean that Amazon Aurora is necessarily a better choice than Google Cloud SQL.

The conclusions that haven’t changed from my first look at Google Cloud SQL bear repeating. First, these two aren’t the only choices for high-performance, high-scalability MySQL-compatible databases. Both DeepSQL and ClustrixDB also deserve your consideration. Second, Amazon Aurora can currently scale up and out beyond Google Cloud SQL, not only in memory and number of CPUs, but also in storage capacity (64TB versus 10TB) and number of failover targets (15 versus 1). For some applications, that may matter. For others, it may not be material.

Third, databases work best when they are “near” the applications using them. If your apps are already in the Google or Amazon clouds, then it makes sense to keep your databases not only in the same cloud as the apps, but also in the same availability zone. From personal experience, you can easily lose a factor of two in peak database performance and more than that in latency by putting the client and database in different zones, even if they are in the same region.

Finally, let me repeat the most important fact about benchmarks: They aren’t your application, and they may or may not bear any resemblance to your loads. Do your own tests using your actual load profiles for a period of hours, not minutes, before committing yourself to one database as a service or another.

InfoWorld Scorecard
Performance (25%)
Scalability (20%)
Management (25%)
Availability (20%)
Value (10%)
Overall Score (100%)
Amazon RDS for Aurora 9 9 9 10 10 9.3
Google Cloud SQL Second Generation 8 8 9 9 10 8.7
At a Glance

Copyright © 2016 IDG Communications, Inc.