The new hi1.4xlarge instances in EC2 are pretty exciting, not only because they’re equipped with SSDs but because they’re also equipped with 10GbE and placement groups allow you to create server clusters that are closely colocated with full bandwidth among them. I was about ready to do another round of GlusterFS testing to see the effects of some recent changes (specifically the multi-threaded SSL transport and Avati’s delayed post-op in AFR) so it seemed like a good time to try out the new instances as well.
After firing up my two server instances, the first thing I did was check my local I/O performance. Each volume seemed to top out at approximately 30K IOPS, same as I’d seen at Storm on Demand when I was testing my replication code there, but the Amazon instances have two of those so they should be able to do 60K IOPS per instance (the 100K everyone else keeps quoting is just a marketing number). I couldn’t immediately fire up a third instance in the same placement group because of resource limits so I fired up a plain old m1.xlarge for the client. I’ve applied for a resource-limit increase so I can do the test I wanted to do, but for now these results should at least be directly comparable to Storm on Demand. All of these tests were run on a four-brick two-way-replicated GlusterFS volume to take full advantage of the hardware in the servers. Please bear in mind that these are random synchronous writes over a (slow) network, so the numbers will seem very low compared to those you’d get if you were testing async I/O locally. This is all about a worst case; the best case just wasn’t interesting enough to report on.
If you compare to the Storm on Demand graph (link above) a few things immediately become apparent. One is that the highest valid number (the unsafe “fast-path” number doesn’t count) has gone up from about 3000 to about 4000. That’s nice, but also bear in mind that the Amazon instances cost $3.10 per hour and the Storm on Demand instances are only $0.41 per hour. Even if the IOPS numbers had doubled, that still doesn’t seem like such a great deal.
The second obvious result is that the same number for “plain old AFR” has gone up from ~1500 IOPS to well over 4000, quite handily overtaking my own hsrepl. I’m not entirely sure why hsrepl actually managed to get worse, but my working theory is that the new handling of “xdata” (where we put the version numbers necessary for correct operation) is considerably less efficient than the handling I’d implemented on my own before. I don’t have hard evidence of that, but the new code will definitely go around in a much longer code path issuing more reads for the same data, and the sudden drop-off for hsrepl in my own local testing corresponds exactly with that change. In the end we seem to be even further from that theoretical maximum, even though the absolute IOPS number has increased.
The other mystery for me is why the multi-threading also seems to make things worse. This isn’t actually doing SSL, even though the two features were inextricably tied together in the same patch, so there’s not a lot more total computational load. These machines have plenty of cores to spare, so it shouldn’t be a thread-thrashing issue either. I expected the multi-threaded numbers to get a bit better, and in all of my other tests that has been the case. Maybe when I get my resource limit increased I’ll see something different in the all-10GbE environment.
That’s pretty much all I have to say about the new instances or GlusterFS running on them. They’re certainly a welcome improvement for this worst-case kind of workload, but I’ve seen their ilk before so the only thing that’s really new to me is the high price tag.