High-availability storage using server-side AFR

From GlusterDocumentation

Jump to: navigation, search

Contents

Introduction

In this howto we will set up an HA cluster using two storage nodes and two clients; however, due to the scalable nature of this configuration, the number of storage and clients nodes can be easily increased.

A basic knowledge of DNS, Linux administration, and network topology is assumed.

High-Availability

The idea of a high-availability cluster is simple : if even one of the storage nodes is functional, then the data should be accessible by the clients. In a Gluster-based environment, this can be accomplished using a combination of server-side data replication and round-robin DNS addressing. For performance purposes, storage-related traffic can be moved to a physically separate high-speed network - an obvious enhancement to any HA environment.

Network

Starting from the bottom and working our way up is the best approach to understanding this configuration; thus, we shall begin with a description of the physical network. In a typical environment, the storage cluster will be used solely for storing and serving data to clients, while the clients may engage in any number of diverse activities. Commonly, clients are themselves responsible for delivery of other network services, such as web, email, or FTP servers.

Given this as the premise, it is easy to imagine a scenario whereby many, say, web servers need to ensure that :

  • Requests to and from the storage cluster are fast and responsive
  • Normal web-serving traffic is not affected by the storage requests

Storage Network

One natural solution to this problem is to segregate the different types of traffic on to different physical networks. Each of the web servers would therefore have two network interfaces; one for the "storage network", and another for regular traffic (HTTP, SSH for administration, etc...). Consider the following graphic :

Image:network1.png

In the above scenario, regular traffic is carried via an (unspecified) "regular network" on eth0 using the 10.0.0.0 address space, and storage traffic is carried via a dedicated storage network on eth1 using the 192.168.0.0 address space. None of the machines are set up as bridges - since there is no way for the two networks to talk to each other, the two networks are therefore segregated. The storage network can set up using commodity Gigabit Ethernet hardware, or Fiber Channel for those with the budget and inclination.

In order to keep things tidy, the addressing scheme of the storage network mimics that of the regular network: for example, the first web server (www1) is assigned 10.0.0.11 on the regular network, and 192.168.0.11 on the storage network.

DNS

The storage network requires basic DNS resolution, and thus, a private zone would be set up for just this purpose. The DNS server responsible for the zone needs to be accessible by the members of the storage network (obviously), but need not be attached to the storage network itself. In this howto, the storage network zone is called "storagenet.gfs", with each member of the storage network assigned the same hostname as on the general network; for example, the www1(.general.net) server on the general network remains www1(.storagenet.gfs) on the storage network.

Thus, while querying the properly-configured DNS server, one would see the following results :

$ host www1.general.net
www1.general.net has address 10.0.0.10

$ host www1.storagenet.gfs
www1.storagenet.gfs has address 192.168.0.10

Further discussion on the setup and configuration of a basic DNS server is outside of the scope of this document, and left as an exercise to the reader.

Round-Robin DNS

A key component of the HA setup described by this document is round-robin DNS (RRDNS). Though it is used only in one instance, it is a critical function - one which helps to ensure that the data can be served continuously even in the event that one of the storage servers becomes inaccessible. In a basic Gluster configuration the clients are told to access servers via their IP addresses; while functional, this has the drawback of causing the data to become inaccessible if the IP address cannot be reached (i.e. the server dies). This problem is mitigated by using an single hostname for both of the servers, as in the diagram to the right.

Consider the following results :

$ host storage1.storagenet.gfs
storage1.storagenet.gfs has address 192.168.0.110

$ host storage2.storagenet.gfs
storage2.storagenet.gfs has address 192.168.0.111

$ host cluster.storagenet.gfs
cluster.storagenet.gfs has address 192.168.0.110
cluster.storagenet.gfs has address 192.168.0.111
$ dig cluster.storagenet.gfs | grep -A 2 "ANSWER SECTION"
;; ANSWER SECTION:
cluster.storagenet.gfs. 3600   IN     A      192.168.0.110
cluster.storagenet.gfs. 3600   IN     A      192.168.0.111

Briefly stated, the Gluster clients will be aware of multiple servers (in this case, two) instead of just one. In this fashion, when one of the storage nodes becomes inaccessible, the clients will use the other automatically - exactly how this works will be explored in the following section. For now, consider the following diagram, which shows the network with some additional DNS-level information :

Image:Network2.png

Gluster

Now that the network and DNS architectures are well understood, we can move on to the Gluster configuration. As with the previous sections, the Gluster configuration files are relatively straightforward; the key element to be aware of is the Automatic File Replication (or "AFR") translator, which will be discussed below.

The basic premise is that the storage servers are responsible solely for themselves, which is to say that the functions of file replication and so forth are assigned to the storage servers - this is important to note, as many of the examples available on the wiki at large put these functions on the clients (about which much discussion has been generated on the mailing list).

AFR

The AFR translator is used to replicate files and directories automatically, thus creating identical copies of the same data - or "subvolumes" in the Gluster vernacular - across multiple servers. In this scenario, AFR is used to ensure that both of the storage servers contain the same subvolumes at all times.

Server Config

The server configuration files on storage1 and storage2 are nearly identical to each other.

TODO : discuss transport-timeout

storage1

[user@storage1 ~]$ cat /etc/glusterfs/glusterfs-server.vol


##############################################
###  GlusterFS Server Volume Specification  ##
##############################################

# dataspace on storage1
volume gfs-ds
  type storage/posix
  option directory /opt/gfs-ds
end-volume

# posix locks
volume gfs-ds-locks
  type features/posix-locks
  subvolumes gfs-ds
end-volume

# dataspace on storage2
volume gfs-storage2-ds
  type protocol/client
  option transport-type tcp/client
  option remote-host 192.168.0.111      # storage network
  option remote-subvolume gfs-ds-locks
  option transport-timeout 10           # value in seconds; it should be set relatively low
end-volume

# automatic file replication translator for dataspace
volume gfs-ds-afr
  type cluster/afr
  subvolumes gfs-ds-locks gfs-storage2-ds         # local and remote dataspaces
end-volume

# the actual exported volume
volume gfs
  type performance/io-threads
  option thread-count 8
  option cache-size 64MB
  subvolumes gfs-ds-afr
end-volume

# finally, the server declaration
volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes gfs
  # storage network access only
  option auth.ip.gfs-ds-locks.allow 192.168.0.*,127.0.0.1
  option auth.ip.gfs.allow 192.168.0.*
end-volume

storage2

[user@storage2 ~]$ cat /etc/glusterfs/glusterfs-server.vol


##############################################
###  GlusterFS Server Volume Specification  ##
##############################################

# dataspace on storage2
volume gfs-ds
  type storage/posix
  option directory /opt/gfs-ds
end-volume

# posix locks
volume gfs-ds-locks
  type features/posix-locks
  subvolumes gfs-ds
end-volume

# dataspace on storage1
volume gfs-storage1-ds
  type protocol/client
  option transport-type tcp/client
  option remote-host 192.168.0.110      # storage network
  option remote-subvolume gfs-ds-locks
  option transport-timeout 10           # value in seconds; it should be set relatively low
end-volume

# automatic file replication translator for dataspace
volume gfs-ds-afr
  type cluster/afr
  subvolumes gfs-ds-locks gfs-storage1-ds         # local and remote dataspaces
end-volume

# the actual exported volume
volume gfs
  type performance/io-threads
  option thread-count 8
  option cache-size 64MB
  subvolumes gfs-ds-afr
end-volume

# finally, the server declaration
volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes gfs
  # storage network access only
  option auth.ip.gfs-ds-locks.allow 192.168.0.*,127.0.0.1
  option auth.ip.gfs.allow 192.168.0.*
end-volume

Client Config

The client configuration is very simple and, in fact, identical on each client. It is in this configuration where the RRDNS hostname comes into play - the remote-host is, in this case, defined as cluster.storagenet.gfs. When the Gluster client process does a lookup on cluster, it will store both responses in its cache, then randomly choose one to actually use. If the server becomes inaccessible, the Gluster client will wait for the period of time defined by transport-timeout, then automatically attempt to use the other response in the cache. See this thread from the Gluster mailing list for more information.

In this fashion, the client performs a failover from the non-functional server, to the functional one, thus ensuring that services are not interrupted for long. Whether the storage cluster has two nodes (as in this example), or two hundred (oh my!), the failover process is identical.

It is worth noting that this process is totally automatic, which is a good thing when it happens at 04:00 on Sunday morning!

www1

[user@www1 ~]$ cat /etc/glusterfs/glusterfs-client.vol


#############################################
##  GlusterFS Client Volume Specification  ##
#############################################

# the exported volume to mount                    # required!
volume cluster
  type protocol/client
  option transport-type tcp/client
  option remote-host cluster.storagenet.gfs       # RRDNS
  option remote-subvolume gfs                     # exported volume
  option transport-timeout 10                     # value in seconds, should be relatively low
end-volume

# performance block for cluster                   # optional!
volume writeback
  type performance/write-behind
  option aggregate-size 131072
  subvolumes cluster
end-volume

# performance block for cluster                   # optional!
volume readahead
  type performance/read-ahead
  option page-size 65536
  option page-count 16
  subvolumes writeback
end-volume

Conclusion

Questions or comments should be directed to the GlusterFS mailing list.

Phrawzty 07:37, 2 May 2008 (PDT)

Personal tools