The Gluster Blog

Gluster blog stories provide high-level spotlights on our users all over the world

Testing on Amazon’s New SSD Instances

Gluster
July 19, 2012

The new hi1.4xlarge instances in EC2 are pretty exciting, not only because they’re equipped with SSDs but because they’re also equipped with 10GbE and placement groups allow you to create server clusters that are closely colocated with full bandwidth among them. I was about ready to do another round of GlusterFS testing to see the effects of some recent changes (specifically the multi-threaded SSL transport and Avati’s delayed post-op in AFR) so it seemed like a good time to try out the new instances as well.

After firing up my two server instances, the first thing I did was check my local I/O performance. Each volume seemed to top out at approximately 30K IOPS, same as I’d seen at Storm on Demand when I was testing my replication code there, but the Amazon instances have two of those so they should be able to do 60K IOPS per instance (the 100K everyone else keeps quoting is just a marketing number). I couldn’t immediately fire up a third instance in the same placement group because of resource limits so I fired up a plain old m1.xlarge for the client. I’ve applied for a resource-limit increase so I can do the test I wanted to do, but for now these results should at least be directly comparable to Storm on Demand. All of these tests were run on a four-brick two-way-replicated GlusterFS volume to take full advantage of the hardware in the servers. Please bear in mind that these are random synchronous writes over a (slow) network, so the numbers will seem very low compared to those you’d get if you were testing async I/O locally. This is all about a worst case; the best case just wasn’t interesting enough to report on.

Amazon High I/O Performance

If you compare to the Storm on Demand graph (link above) a few things immediately become apparent. One is that the highest valid number (the unsafe “fast-path” number doesn’t count) has gone up from about 3000 to about 4000. That’s nice, but also bear in mind that the Amazon instances cost $3.10 per hour and the Storm on Demand instances are only $0.41 per hour. Even if the IOPS numbers had doubled, that still doesn’t seem like such a great deal.

The second obvious result is that the same number for “plain old AFR” has gone up from ~1500 IOPS to well over 4000, quite handily overtaking my own hsrepl. I’m not entirely sure why hsrepl actually managed to get worse, but my working theory is that the new handling of “xdata” (where we put the version numbers necessary for correct operation) is considerably less efficient than the handling I’d implemented on my own before. I don’t have hard evidence of that, but the new code will definitely go around in a much longer code path issuing more reads for the same data, and the sudden drop-off for hsrepl in my own local testing corresponds exactly with that change. In the end we seem to be even further from that theoretical maximum, even though the absolute IOPS number has increased.

The other mystery for me is why the multi-threading also seems to make things worse. This isn’t actually doing SSL, even though the two features were inextricably tied together in the same patch, so there’s not a lot more total computational load. These machines have plenty of cores to spare, so it shouldn’t be a thread-thrashing issue either. I expected the multi-threaded numbers to get a bit better, and in all of my other tests that has been the case. Maybe when I get my resource limit increased I’ll see something different in the all-10GbE environment.

That’s pretty much all I have to say about the new instances or GlusterFS running on them. They’re certainly a welcome improvement for this worst-case kind of workload, but I’ve seen their ilk before so the only thing that’s really new to me is the high price tag.

BLOG

  • 26 Apr 2019
    Gluster Monthly Newsletter, April 2...

    Upcoming Community Happy Hour at Red Hat Summit! Tue, May 7, 2019, 6:30 PM – 7:30 PM EDT https://cephandglusterhappyhour_rhsummit.eventbrite.com has all the details. Gluster 7 Roadmap Discussion kicked off for our 7 roadmap on the mailing lists, see [Gluster-users] GlusterFS v7.0 (and v8.0) roadmap discussion https://lists.gluster.org/pipermail/gluster-users/2019-March/036139.html for more details. Community...

    Read more
  • 24 Apr 2019
    Community Survey Feedback, 2019

    In this year’s survey, we asked quite a few questions about how people are using Gluster, how much storage they’re managing, their primary use for Gluster, and what they’d like to see added. Here’s some of the highlights from this year!

    Read more
  • 24 Apr 2019
    How to Deploy the OpenVPN Encryptio...

    This is part of a new series on using Gluster! OpenVPN is open source software that serves as the basis for a Virtual Private Network capable of supporting a point-to-point or site-to-site connection. Along with the fact that it’s free to use, it also has the benefit of being one...

    Read more