GlusterFS is a powerful network/cluster filesystem written in user space which uses FUSE to hook itself with VFS layer. GlusterFS takes a layered approach to the file system, where features are added/removed as per the requirement. Though GlusterFS is a File System, it uses already tried and tested disk file systems like ext3, ext4, xfs, etc. to store the data. It can easily scale up to petabytes of storage which is available to user under a single mount point.
- The brick is the storage filesystem that has been assigned to a volume.
- The machine which mounts the volume (this may also be a server).
- The machine (virtual or bare metal) which hosts the actual filesystem in which data will be stored.
- a brick after being processed by at least one translator.
- The final share after it passes through all the translators.
A translator connects to one or more subvolumes, does something with them, and offers a subvolume connection.
The brick's first translator (or last, depending on what direction data is flowing) is the storage/posix translator that manages the direct filesystem interface for the rest of the translators.
The configuration of translators (since GlusterFS 3.1) is managed through the gluster command line interface (cli), so you don't need to know in what order to graph the translators together.
All the translators hooked together to perform a function is called a graph. A complete brick graph might look like this:
Distribute takes a list of subvolumes and distributes files across them, effectively making one single larger storage volume from a series of smaller ones.
With 4 files
Each file may (probably would) be written to a different server using the distribute translator.
The server that the files are written to is calculated by hashing the filename. If the filename changes, a pointer file is written to the server that the new hash code would point to, telling the distribute translator which server the file is actually on.
Plus and Minus
- If you lose a single server, you lose access to all the files that are hosted on that server. This is why distribute is typically graphed to the replicate translator.
- If your file is larger than the subvolume, writing your file will fail.
- If the file is not where the hash code calculates to, an extra lookup operation must be performed, adding slightly to latency.
- The more servers you add, the better this scales in terms of random file access. As long as clients aren't all retrieving the same file, their access should be spread pretty evenly across all the servers.
- Increasing volume can be done by adding a new server. Adding servers can be done on-the-fly (Since 3.1.0).
Replicate is used for providing redundancy to both storage and generally to availability.
Stripes data across bricks in the volume. For best results, you should use striped volumes only in high concurrency environments accessing very large files.