“Operating Version” support in 'glusterd' (GlusterFS's management daemon)
“operating-version” support in 'glusterd' is required to ensure that different versions of gluster binary bits interact with each other without problems.
Amar Tumballi <email@example.com>
Kaushal Madappa <firstname.lastname@example.org>
Currently 'glusterd' doesn't have any proper 'handshake' mechanism built in while getting another 'peer' into the cluster. All it does is looking at peer's RPC program & versions, and if the local RPC program and versions are same, it assumes it can understand the behavior of peer 'glusterd'. This is not the right behavior considering, not all features of glusterfs can result in the changes in management RPC procedures (and hence the RPC program versions may not even change over different versions). For example enabling a option through 'gluster volume set' interface requires just changes into how it is handled while writing the 'volume file' and not at all dependent on what is transferred on wire. So, the current handshake mechanism doesn't prevent any user from having a cluster of glusterds, but a 'volume create' can result in different type of configuration on few machines, which of-course would cause inconsistency of the volume, hence the user data.
Operating-version (here after interchangeably referred as 'op-version'), is a number, which indicates the feature set which is being run in a given 'glusterd' process.
The op-version implementation is done by following below points.
- On each 'glusterd' process:
- The op-version will be persistent, across reboots. It will be a global option similar to UUID of the 'glusterd' process, and hence will be stored in 'glusterd.info' file.
- Every 'glusterd' process will know the min and max op-version it is going to support (GD_OP_VERSION_MIN and GD_OP_VERSION_MAX).
- Each 'glusterd' will look for stored 'op-version' while starting, if not present, it will be initialized with minimum op-version the process supports (GD_OP_VERSION_MIN).
- As a cluster :
- 'cluster' (as in pool of all the peers), as a whole will operate in exact same 'op-version'.
- Each glusterd will keep track of 'cluster_min_op_version', which will be max of min op-versions supported by all glusterds.
- Incrementing and Decrementing the 'op-version' :
- Increment to the 'op-version' on a given glusterd process will happen on only two cases:
- When a 'peer probe' request is received, and after handshake it is good to increment.
- When a 'volume set' operation happens on a newer feature, which belongs to a higher op-version.
- Decrementing the 'op-version': Current plan is to make 'op-version' automatically get decremented when the last option belonging to a higher version gets reset. So, ideally, a 'volume reset force' command should get all the glusterd process in the cluster to its minimum.
- Increment to the 'op-version' on a given glusterd process will happen on only two cases:
More details regarding the implementation are given in the appendix
Benefit to GlusterFS
With the implementation of 'op-version' in glusterd process, it becomes very easy for supporting the upgrade process. Also, this will enable gluster to work smoothly in a heterogeneous (with respect to binary version) manner across the cluster.
Nature of proposed change
The following code changes will be required
probe to have errstr passed along to CLI. (include errstr in deprobe also for future friendly ness)Already present in v 3.3.0.
- Op version handshake to be implemented as 3-way handshake.
- VME table to get one more column with corresponding op-version (glusterd-volgen.c)
- Option to store and retrieve op-version (using glusterd.info file).
- API to check the max op-version used across all volumes (used for decrementing op-version).
- check for validity of op-version for a given 'volume set' key/value at volume_set_handle itself on the CLI receiving glusterd, and at volume_set_stage() on peer glusterds.
Implications on manageability
After this feature gets into the code, it will break 'on-wire' compatibility with previous version (errstr in probe/detach XDR procedures). Already present in v3.3.0 and compatibility with previous versions is broken.
The configuration entry 'operating-version=<N>' will start appearing in 'glusterd.info' file.
The whole handshaking behavior of glusterd with another glusterd changes, hence earlier version will not be compatible with this. (Note: For now, compatibility with v3.3.0 will be maintained by keeping the present dump_version routine. The new op_version handshake will be performed after the dump_version if the peer supports it.)
Implications on presentation layer
Implications on persistence layer
Implications on 'GlusterFS' backend
Modification to GlusterFS metadata
Implications on 'glusterd'
Look at manageability
How To Test
Test can be done by creating a cluster with different versions of gluster and checking if everything works as expected.
No changes anticipated for the user experience. User will continue to use the same set of commands which are provided in the documentation.
With this feature, the backward compatibility with older versions will break, ie., the peer probe and the regular handshake will fail between the glusterd processes with this feature, and glusterd without this feature.
Comments and Discussion
This feature page is not yet complete. It will be kept updated as regularly as possible. -Kaushal
Mgmt daemon - Mgmt daemon interactions
The proposed solution involves the usage of an operating version (op_version).
Every glusterd in a cluster will operate at the same current_op_version, ie. a cluster will operate at current_op_version.
The current_op_version of a glusterd can be increased in the following cases.
- Peer probe
- Activating feature of higher version
Only the current_op_version of the peer being probed is able to change during a peer probe. The exchange and validation of op_versions will occur during the handshake process of a peer probe.
The new op_version handshake will be built upon the existing handshake procedure which just performs an rpc version dump before establishing a connection. The new handshake routine will be run based on the progver returned. The minimum supported progver will be the progver for 3.3.0 (PROGVER_MIN). Connection will be established only with peers with progver >= PROG_VER_MIN.
Assume we have an existing cluster with peers A & B, and we probe peer C from A. The cluster is operating at version N, and peer C has MIN_OP_VERSION N-1 and MAX_OP_VERSION N+1. Assuming peer C is a fresh peer and hasn't been part of any other cluster, it will be operating with current_op_version N-1. The handshake process will be as follows,
- A initiates the handshake with C
- C replies with its MAX_OP_VERSION and current_op_version (here N+1, N-1)
- A validates that C can support the op_version of the cluster.
- A sends an ACK to C with the its current_op_version (N).
- C raises its current_op_version to N
- Peer state-machine takes over from this point.
One restriction for the handshake is that the current_op_version of C can only be increased but not decreased.
Activating a higher version feature:
In the future all features will be activated using the "volume set" interface. (By default only the features of the MIN_OP_VERSION will be activated). To achieve this the following changes will need to be done.
- volopt_map_entry (vme) will contain a new field, key_op_version, which signifies the op_version which supports the feature. This is the tagging of features with op_version described above.
- every entry in the glusterd_volopt_map will be updated to include the key_op_version.
- the stage_op of volume set will check the key_op_version of the requested key and increase the current_op_version of the cluster if required.
The volume set operation will be as follows,
- glusterd recieves a volume set request from cli with a key and value, and enters the stage phase.
- During stage,
- glusterd will lookup the key in the volopt_map
- If the key is present, glusterd will check the op_version of the key (key_op_version) with its current_op_version
- If the key_op_version > current_op_version and <= MAX_OP_VERSION, the current_op_version will be changed to key_op_version
- glusterd will send out a request to raise current_op_version to key_op_version to other peers.
- Other peers will check with their supported op_versions and raise/not raise current_op_version.
- If even one peer fails to raise current_op_version, stage will fail, current_op_version will be reverted to original.
- If all peers succeed, stage succeeds and the operation proceeds to commit state.
The current_op_version of cluster cannot be decreased by the direct action of any gluster command. It will occur automatically based on the feature of the highest versioned feature activated.
To achieve this, each volume will have its own current op_version (volume_op_version)
The volume_op_version will be the op_version of the highest versioned feature in use by that volume.
The cluster current_op_version will be the highest volume_op_version.
The volume_op_version need not be stored persistently. It can be calculated during runtime, based on the features activated.