In addition to watching the election coverage last night, I spent some time giving MooseFS another try. It’s a project I had high hopes for once, I even considered it as an alternative to GlusterFS as a basis for what was then CloudFS, but I was put off by several things – single metadata server, not modular, inactive/hostile community. When I tested it a couple of years ago it couldn’t even survive a simple test (write ten files concurrently and then read them all back concurrently) without hanging, so I pretty much forgot about it. When somebody recently said it was “better than GlusterFS” I decided it was time to put that to the test. Here are some results. First, let’s look at performance.
OK, so it looks way faster than GlusterFS. That’s a bit misleading, though. For one thing, this is with replication turned on. GlusterFS replication is client to both servers; MooseFS is client to one server then that server to others (so the first server can use both halves of a full-duplex link). That means GlusterFS should scale better as the client:server ratio increases, but also that single-client performance will be half that for MooseFS. That’s what the “limit” lines in the chart above show, but those lines show something even more interesting. The MooseFS numbers are higher than is actually possible on the single GigE that client had. These numbers are supposed to include fsync time, and the GlusterFS numbers reflect that, but the MooseFS numbers keep going as data continues to be buffered in memory. That’s neither correct nor sustainable.
The way that MooseFS ignores fsync reminds me of another thing I noticed a while ago: it ignores O_SYNC too. I verified this by looking at the code and seeing where O_SYNC got stripped out, and now my tests show the same effect. Whereas the GlusterFS IOPS numbers with O_SYNC take the expected huge hit relative to those above, the MooseFS numbers actually improve somewhat. POSIX compliant, eh? Nope, not even trying. As a storage guy who cares about user data, I find that totally unacceptable and sufficient to disqualify MooseFS for serious use. The question is: how hard is it to fix? Honoring O_SYNC isn’t just a matter of passing it to the server, which would be easy. It’s also a matter of fixing the fsync behavior to make sure the O_SYNC request actually gets to the server, and – a little more arguably – to all servers that are supposed to hold the data. Those parts might be more difficult.
In any case, let’s take a more detailed look at how MooseFS compares to GlusterFS.
Basically, if performance matters to you more than data integrity (e.g. the data exists elsewhere already), and/or if you really really need snapshots right now, then I don’t see MooseFS as an invalid choice. Go ahead, knock yourself out. You can even post your success stories here if you want. On the other hand, if you have any other needs whatsoever, I’d warn you to be careful. You might get yourself in deep trouble, and I’ve never seen anyone say the developers or other community were any help at all.
2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...
It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...
The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...