The Gluster Blog

Gluster blog stories provide high-level spotlights on our users all over the world

Linux scale out NFSv4 using NFS-Ganesha and GlusterFS — one step at a time

Gluster
2015-10-12

NFS-Ganesha 2.3 is rapidly winding down to release and it has a bunch of new things in it that make it fairly compelling. A lot of people are also starting to use Red Hat Gluster Storage with the NFS-Ganesha NFS server that is part of that package. Setting up a highly available NFS-Ganesha system using GlusterFS is not exactly trivial. This blog post will “eat the elephant” one bite at a time.

Some people might wonder why use NFS-Ganesha — a user space NFS server — when kernel NFS (knfs) already supports NFSv4? The answer is simple really. NFSv4 in the kernel doesn’t scale. It doesn’t scale out, and it’s a single point of failure. This blog post will show how to set up a resilient, highly available system with no single point of failure.

Crawl

Let’s start small and simple. We’ll set up a single NFS-Ganesha server on CentOS 7, serving a single disk volume.

Start by setting up a CentOS 7 machine. You may want to create a separate volume for the NFS export. We’ll leave this as an exercise for the reader. do not install any NFS.

1. Install EPEL, NFS-Ganesha and GlusterFS. Use the yum repos on download.gluster.org. Repo files are at
nfs-ganesha.repo and glusterfs-epel.repo. Copy them to /etc/yum.repos.d.

    % yum -y install epel-release
    % yum -y install glusterfs-server glusterfs-fuse glusterfs-cli glusterfs-ganesha
    % yum -y install nfs-ganesha-xfs

2. Create a directory to mount the export volume, make a file system on the export volume, and finally mount it:

    % mkdir -p /bricks/demo
    % mkfs.xfs /dev/sdb
    % mount /dev/sdb /bricks/demo

3. Gluster recommends not creating volumes on the root directory of the brick. If something goes wrong it’s easier rm -rf the directory than it is to try and clean it or remake the file system. Create a couple subdirs on the brick:

    % mkdir /bricks/demo/vol
    % mkdir /bricks/demo/scratch

4. Edit the Ganesha config file at /etc/ganesha/ganesha.conf. Here’s what mine looks like:

EXPORT
{
	# Export Id (mandatory, each EXPORT must have a unique Export_Id)
	Export_Id = 1;

	# Exported path (mandatory)
	Path = /bricks/demo/scratch;

	# Pseudo Path (required for NFS v4)
	Pseudo = /bricks/demo/scratch;

	# Required for access (default is None)
	# Could use CLIENT blocks instead
	Access_Type = RW;

	# Exporting FSAL
	FSAL {
		Name = XFS;
	}
}

5. Start ganesha:

    % systemctl start nfs-ganesha

6. Wait one minute for NFS grace to end, then mount the volume:


    % mount localhost:/scratch /mnt

Walk

7. Now we’ll create a simple gluster volume and use NFS_Ganesha to serve it. We also need to disable gluster’s nfs (gnfs).


    % gluster volume create simple $hostname:/bricks/demo/simple
    % gluster volume set simple nfs.disable on
    % gluster volume start simple

8. Edit the Ganesha config file at /etc/ganesha/ganesha.conf. Here’s what mine looks like:

EXPORT
{
	# Export Id (mandatory, each EXPORT must have a unique Export_Id)
	Export_Id = 1;

	# Exported path (mandatory)
	Path = /simple;

	# Pseudo Path (required for NFS v4)
	Pseudo = /simple;

	# Required for access (default is None)
	# Could use CLIENT blocks instead
	Access_Type = RW;

	# Exporting FSAL
	FSAL {
		Name = GLUSTER;
		Hostname = localhost;
		Volume = simple;
	}
}

9. Restart ganesha:


    % systemctl stop nfs-ganesha
    % systemctl start nfs-ganesha

10. Wait one minute for NFS grace to end, then mount the volume:


    % mount localhost:/simple /mnt

Copy a file to the NFS volume. You’ll see it on the gluster brick in /bricks/demo/simple.

Run

Now for the part you’ve been waiting for. For this we’ll start from scratch. This will be a four node cluster: node0, node1, node2, and node3.

1. Tear down anything left over from the above.

2. Ensure that all nodes are resolvable either in DNS or /etc/hosts:


    node0% cat /etc/hosts
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    172.16.3.130 node0
    172.16.3.131 node1
    172.16.3.132 node2
    172.16.3.133 node3

    172.16.3.140 node0v
    172.16.3.141 node1v
    172.16.3.142 node2v
    172.16.3.143 node3v

3. Set up passwordless ssh among the four nodes. On node1 create a keypair and deploy it to all the nodes:


    node0% ssh-keygen -f /var/lib/glusterd/nfs/secret.pem
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node0
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node1
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node2
    node0% ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@node3
    node0% scp /var/lib/glusterd/nfs/secret.* node1:/var/lib/glusterd/nfs/
    node0% scp /var/lib/glusterd/nfs/secret.* node2:/var/lib/glusterd/nfs/
    node0% scp /var/lib/glusterd/nfs/secret.* node3:/var/lib/glusterd/nfs/

You can confirm that it works with:

    node0% ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/nfs/secret.pem root@node1

4. Start glusterd on all nodes:

    node0% systemctl enable glusterd && systemctl start glusterd
    node1% systemctl enable glusterd && systemctl start glusterd
    node2% systemctl enable glusterd && systemctl start glusterd
    node3% systemctl enable glusterd && systemctl start glusterd

5. From node0, peer probe the other nodes:

    node0% gluster peer probe node1
    peer probe: success
    node0% gluster peer probe node2
    peer probe: success
    node0% gluster peer probe node3
    peer probe: success

You can confirm their status with:

    node0% gluster peer status
    Number of Peers: 3

    Hostname: node1
    Uuid: ca8e1489-0f1b-4814-964d-563e67eded24
    State: Peer in Cluster (Connected)

    Hostname: node2
    Uuid: 37ea06ff-53c2-42eb-aff5-a1afb7a6bb59
    State: Peer in Cluster (Connected)

    Hostname: node3
    Uuid: e1fb733f-8e4e-40e4-8933-e215a183866f
    State: Peer in Cluster (Connected)

6. Create the /etc/ganesha/ganesha-ha.conf file on node0. Here’s what mine looks like:

# Name of the HA cluster created.
# must be unique within the subnet
HA_NAME="demo-cluster"
#
# The gluster server from which to mount the shared data volume.
HA_VOL_SERVER="node0"
#
# You may use short names or long names; you may not use IP addresses.
# Once you select one, stay with it as it will be mildly unpleasant to clean up if you switch later on. Ensure that all names - short and/or long - are in DNS or /etc/hosts on all machines in the cluster.
#
# The subset of nodes of the Gluster Trusted Pool that form the ganesha HA cluster. Hostname is specified.
HA_CLUSTER_NODES="node0,node1,node2,node3"
#
# Virtual IPs for each of the nodes specified above.
VIP_node0="172.16.3.140"
VIP_node1="172.16.3.141"
VIP_node2="172.16.3.142"
VIP_node3="172.16.3.143"

7. Enable the Gluster shared state volume:

    node0% gluster volume set all cluster.enable-shared-storage enable

Wait a few moments for it to be mounted everywhere. You can check that it’s mounted at /run/gluster/shared_storage (or /var/run/gluster/shared_storage) on all the nodes.

8. Enable and start the Pacemaker pcsd on all nodes:

    node0% systemctl enable pcsd && systemctl start pcsd
    node1% systemctl enable pcsd && systemctl start pcsd
    node2% systemctl enable pcsd && systemctl start pcsd
    node3% systemctl enable pcsd && systemctl start pcsd

9. Set a password for the user ‘hacluster’ on all nodes. Use the same password for all nodes:

    node0% echo demopass | passwd --stdin hacluster
    node1% echo demopass | passwd --stdin hacluster
    node2% echo demopass | passwd --stdin hacluster
    node3% echo demopass | passwd --stdin hacluster

10. Perform cluster auth between the nodes. Username is ‘hacluster’, Password is the one you used in step 9:

    node0% pcs cluster auth node0
    node0% pcs cluster auth node1
    node0% pcs cluster auth node2
    node0% pcs cluster auth node3

11. Create the Gluster volume to export. We’ll create a 2×2 distribute-replicate volume. Start the volume:

    node0% gluster volume create cluster-demo replica 2 node0:/home/bricks/demo node1:/home/bricks/demo node2:/home/bricks/demo node3:/home/bricks/demo
    node0% gluster volume start cluster-demo

12. Enable ganesha, i.e. start the ganesha.nfsd:

    node0% gluster nfs-ganesha enable

13. Export the volume:

    node0% gluster vol set cluster-demo ganesha.enable on

14. And finally mount the NFS volume from a client using one of the virtual IP addresses:

    nfs-client% mount node0v:/cluster-demo /mnt

BLOG

  • 15 Oct 2019
    Gluster and CentOS Stream

    Progress cannot be made without change. As technologists, we recognize this every day. Most of the time, these changes are iterative: progresssive additions of features to projects like Gluster. Sometimes those changes are small, and sometimes not. And that’s, of course, just talking about our project. But one of the...

    Read more
  • 26 Apr 2019
    Gluster Monthly Newsletter, April 2...

    Upcoming Community Happy Hour at Red Hat Summit! Tue, May 7, 2019, 6:30 PM – 7:30 PM EDT https://cephandglusterhappyhour_rhsummit.eventbrite.com has all the details. Gluster 7 Roadmap Discussion kicked off for our 7 roadmap on the mailing lists, see [Gluster-users] GlusterFS v7.0 (and v8.0) roadmap discussion https://lists.gluster.org/pipermail/gluster-users/2019-March/036139.html for more details. Community...

    Read more
  • 24 Apr 2019
    Community Survey Feedback, 2019

    In this year’s survey, we asked quite a few questions about how people are using Gluster, how much storage they’re managing, their primary use for Gluster, and what they’d like to see added. Here’s some of the highlights from this year!

    Read more