NFS mount for GlusterFS gives better read performance for small files?

Gluster

2012-09-21

This concept is thrown around a lot. People frequently say that “GlusterFS is slow with small files”, or “how can I increase small file performance” without really understanding what they mean by “small files” or even “slow”.

“Small files” is sort of a misconception on its own. Initial file ops include a small amount of overhead, with a lookup, the filename is hashed, the dht subvolume is selected and the request is sent to that subvolume. If it’s a replica, the request is sent to each replica in that subvolume set (usually 2). If it is a replica, all the replicas have to respond. If one or more have pending flags or there’s an attribute mismatch, either some self heal action has to take place, or a split-brain is determined. If the file doesn’t exist on the dht subvolume that the hash predicted (due to the file being renamed usually), the same steps must be done to all the dht subvolumes. If the file is found, a link file is made on the predicted dht subvolume pointing to the place we actually found the file. This will make finding it faster the next time. Once the file is found and is determined to be clean, the file system can move on to the next file operation.

PHP applications, specifically, normally have a lot of small files that are opened for every page query so per-page, that overhead adds up. PHP also queries a lot of files that just don’t exist. Your single page might query 200 files that just aren’t there. They’re in a different portion of the search path, or they’re a plugin that’s not used, etc. These negative lookups take longer the more dht subvolumes you have as it has to query every subvolume brick to see if the file exists, but not where we predict.

NFS mitigates that affect by using FScache in the kernel. It stores directories and stats, preventing the call to the actual filesystem. This also means, of course, that the image that was just uploaded through a different server isn’t going to exist on this one until the cache times out. Stale data in a multi-client system is going to have to be expected in a cached client.

Jeff Darcy created a test translator that caches negative lookups which he said also mitigated the PHP problem pretty nicely.

If you have control over your app, things like absolute pathing for PHP includes or leaving file descriptors open can also avoid overhead. Also, optimizing the number of times you open a file or the number of files to open can help.

So “small files” refers to the percent of total file op time that’s spent on overhead vs actual data retrieval.

Tags
glusterfs

BLOG

06 Dec 2020
Looking back at 2020 – with g...

2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...

Read more
27 Apr 2020
Update from the team

It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...

Read more
03 Feb 2020
Building a longer term focus for Gl...

The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...

Read more

NFS mount for GlusterFS gives better read performance for small files?

BLOG

Looking back at 2020 – with g...

Update from the team

Building a longer term focus for Gl...