Planning ahead for Gluster releases

Amar Tumballi

2019-11-28

In order to plan the content for upcoming releases, it is good to take a moment of pause, step back and attempt to look at the consumption of GlusterFS within large enterprises. With the enterprise architecture taking large strides towards cloud and more specifically, the hybrid cloud, continued efforts towards capabilities should attempt to serve this emerging base of customers. At the same time, our existing users would need reassurance that we continue to focus on tagging good releases which can be put to deployment with a smaller time window for testing. These are not contradictory goals. Rather, these are complementary topics which are perfectly aligned to allow GlusterFS project become the preferred choice when seeking to deploy storage filesystems.

The best way to propose, assess and scope capabilities is to create a reference frame which allows both transparency and clarity in how we are going to address the goals. We think that the higher order objectives should include

Increasing observability patterns
Enforcing higher quality and predictability in releases
Enabling larger scale
Focusing on workbench to demonstrate performance shapes

With the need to make GlusterFS more native with Kubernetes, we will need to accomplish better adoption of design patterns which reduce the friction for administrators opting for GlusterFS as the persistent storage for the applications. The KaDalu project is attempting to accomplish this objective. Given the nature of Gluster’s modular design, there can be more such projects, suiting different usecases in container world. To be able to get there, we will have to create methods to

Address the topic of migration of data (application data)
Handle integration tests with k8s builds so that a software build pipeline can run uninterrupted
Enable hooks which allow for integration of monitoring and observability into popular frameworks used as part of a standard k8s deployment
Provide resilient storage services when clusters are created for scale

Along with the above, we would need to continue to work on a workbench which allows users to create performance datasets prior to deployment of a release. This would allow a closed loop feedback and also create opportunities to collaborate with users who have workloads that are not easily tested in traditional test bench configurations.

The discussion around switching over to Github for the end-to-end project flow is aimed at reducing the number of tools and systems the project has to maintain in order to do the code-to-build flow along with release management. We hope to gain efficient development habits once we have the foundations of this in place and hooks added allowing better testing of builds against a variety of configurations. Increasing the confidence in the quality of builds is directly linked to increasing the participation of a community of users and developers.

Over the years we have focused on innovation often to the detriment of continued investment in existing capabilities. The feedback we have been receiving through the newly formed Slack channel has been eye opening with regards to the topics we need to address. We see every reason to formulate a principle that is innovation AND stability.

An approach like this would mean that we reassess the state of gNFS and work towards enabling more involvement from users who see this as an important component of their deployed infrastructure. There are a couple of issues we know as of now and if others come up, we will review them in terms of critical production impact. This would also imply that all patches would be reviewed in terms of whether they introduce a regression for gNFS and other non-native access methods. Geo-replication is another topic which has a long pending item around “Path Based Geo-Replication”. There has not been a whole lot of work with code done for this proposal, but it is a capability that improves the product experience and thus needs to be reviewed in terms of the effort required to make it available in an upcoming major release.

There are opportunities to work on the bottlenecks which we have already observed at various deployments. Such plans hinge around the future of the perf xlator and more specifically, how to improve the read/write/seek speeds for small but transaction heavy workloads.

It is also a viable path to create detail specification around better instrumentation for stand-alone deployments which use frameworks like Systemtap or, Performance Co-pilot for more structured analysis of questions which often begin as “my cluster has slowed down”. Previously we have done some work in this direction along with extended team. I believe they can be integrated within the Gluster Health Report or, dashboard and allow better insights into the state of the cluster.

Some of these would require full time engineering involvement from developers familiar with the internals of the Gluster design. In others, we should look at building better ways to include expertise. As an example, if there are tapsets already built around debugging of Gluster, then we would need to include them in the project repository and make them available by default as part of a deployment. There are a bunch of deployment experience choices we can determine as candidates for improvement and those also include how to adapt to methods other than Ansible eg. Terraform based infrastructure design patterns which include storage systems.

All said, what we would want to establish is a clearly understood set of priorities and focus areas which will form the basis of upcoming releases. The clarity helps our users to make a case for features or fixes that are important to them, but did not get assigned to the right priority. Improving how we make well managed releases would contribute to the happiness of the community. That is where we would like to go. The next step from here would be the first draft of a content plan for Gluster releases 8 and 9. Stay tuned!

Planning ahead for Gluster releases

BLOG

Looking back at 2020 – with g...

Update from the team

Building a longer term focus for Gl...