My recent work on High Speed Replication is not the only thing I’ve done to improve GlusterFS performance recently. In addition to that 2x improvement in synchronous/replicated write performance, here are some of the other changes in the pipeline.
Patch 3005 is a more reliable way to make sure we use a local copy of a file when there is one. The current “first to respond” method doesn’t always work, because the first to respond is usually the first to be asked, and we always query the servers in the same order regardless of where we are. Thus, if the bricks are on nodes A and B, everyone will query A first and most of the time everyone – including B – will decide to read from A. The new method should work a lot better in “peer to peer” deployments where every machine is both client and server.
Patch 2926 handles the opposite case, where servers and clients are separate. The same convergence behavior mentioned above was causing strange inconsistencies in performance runs. Sometimes (if clients happened to divide themselves evenly across replicas) performance was really good; other times (if clients happened to converge on the same – usually first – replica) it was really bad. The new method uses hashing to ensure that both one client accessing multiple files and multiple clients accessing the same file will distribute that access across replicas. If hash-based random distribution is good enough for DHT, it’s good enough for AFR as well.
Patch 3004 deals with distribution rather than replication, and it might be the most interesting of the three, but by itself doesn’t tell the whole story of what I’m up to. ALl the patch does is support a “layout” type (see my article about distribution) that the built-in rebalance infrastructure will leave alone. Thus, if you want to create a carefully hand-crafted layout for a directory so that all files go to bricks in a certain rack, you’ll be able to do that safely. More importantly, it provides the essential “hook” for experimenting with different rebalancing algorithms. I already have a Python script that can calculate a new layout based on an old one, with two twists.
The one thing that was missing was a way to apply these changes and make them stick. Once we have that, what I envision is a three-step approach to adding bricks. Sometimes you might not migrate any data at all; you just adjust the layouts to prefer placement of new data on the new bricks (a very small tweak to the total-size-based weighting I’ve already implemented). If you want to be a bit more aggressive about rebalancing, you can do the low-data-movement kind of rebalancing which will get you pretty close to an optimal layout. Then, perhaps during planned downtime or predictable low-usage periods, you do the full rebalance to ensure fully optimal placement of data across your cluster.
Those aren’t the only patches I have in the works, but I think they constitute a coherent group with the common goal of distributing load across the cluster more evenly than before. That’s what “scale out” is supposed to mean, so I’m glad that we’ll finally be capturing more of the advantages associated with that approach.
2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...
It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...
The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...