GlusterFS FAQ

From GlusterDocumentation

Jump to: navigation, search

Contents

General FAQ

What is GlusterFS?

GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. Storage bricks can be made of any commodity hardware such as x86-64 server with SATA-II RAID and Infiniband or GigE interconnect.

What is the problem with DAS / RAID / JBOD / NFS / SAN?

They don't scale both in terms of size and performance. SAN is better than the rest, but it is exorbitantly expensive. But even SAN cannot scale to hundreds of TBs for a large number of clients.

What is the problem with existing Cluster File Systems?

Cluster file systems are still not mature for enterprise market. They are too complex to deploy and maintain though they are extremely scalable and cheap. Can be entirely built out of commodity OS and hardware.

GlusterFS hopes to solves this problem.

Why is striping bad?

  • Increased Overhead

Striping files across multiple bricks and reading/writing them at same time will cause serious disk contention issues and the performance will suffer badly as load increases. If you avoid striping, the underlying filesystem and the I/O scheduler on each brick knows best how to organize the file data into contiguous disk blocks for optimized read and write operations.

  • Increased Complexity

Striping complicates the design of clustered filesystems badly. Instead of using the underlying mature filesystem's ability to do block disk management, you will have to implement another clustered file system across multiple underlying filesystems (duplication).

  • Increased Risk

Loss of a single node can mean loss of entire file system. Imagine how slow it is to run fsck on hundreds of TBs of data.

In reality, striping introduces more problems than it solves. Particularly when a file system scales beyond hundreds of TBs.

Alternatively when files and folders remain as is, they take advantage of the underlying file system to do the real block I/O management. A single file can grow from 4TB to 16TB within a single node. In reality, files are not of TBs in size. When multiple clients access a same file, most likely the blocks are already cached in the RAM and RDMA'ed to the clients. GlusterFS takes advantages of high bandwidth low-latency interconnects such as Infiniband.

GlusterFS AFR (automatic file replication) translator does a striped read to improve performance on mirrored files.

You say striping is bad, but I see a 'cluster/stripe' translator in GlusterFS, why is it?

Well, let me list few points.

  • implementing striping feature was few days work for GlusterFS team, due to its modular design.
  • there are few people/companies who want striping feature in FS as their application uses single large file (100GB - 2TB).
  • if the file is very big (which we don't recommend though) a single server gets complete load when its accessed, so striping helps to distribute the load.
  • if one uses 'cluster/afr' translator with 'cluster/stripe' then GlusterFS can provide high availability.
  • with '1.3.0-pre5' release onwards, GlusterFS keeps info about 'block-size', 'block-index' of a file in extendend attributes, so if taken offline, one can understand where exactly the data fits.

I'm getting bad aggregate performance, how do I tune my performance

Q: I see the throughput I'm getting reading and writing to glusterfs. Now how do I figure out how to improve the performance? I'm using io-threads xlator. I've set it to 8 threads. How do I know if I should increase or decrease that number? How will I know if I will get more advantage by adding another client?

A: As there are so many configuration details, where one can improve the throughput, we recommend you to mail to gluster-devel (at) nongnu.org

Advantages of GlusterFS

  • Designed for O(1) scalability and feature rich.
  • Aggregates on top of existing filesystems. User can recover the files and folders even without GlusterFS.
  • GlusterFS has no single point of failure. Completely distributed. No centralized meta-data server like Lustre.
  • Extensible scheduling interface with modules loaded based on user's storage I/O access pattern.
  • Modular and extensible through powerful translator mechanism.
  • Supports Infiniband RDMA and TCP/IP.
  • Entirely implemented in user-space. Easy to port, debug and maintain.
  • Scales on demand.

Minimum system requirements to use GlusterFS?

On the client side, GlusterFS requires FUSE (Filesystem in Userspace http://fuse.sf.net) kernel support. There is no minimum system requirements as such. It can even run within a system on local loopback. However. here are a few suggestions.

Storage Cluster:

A cluster of 4 or more x86-64 servers with SATA-II RAID and Infiniband interconnect.

NAS Server:

x86-64 platform, 2GB RAM, SATA-II RAID and gigabit ethernet network interface card.

You can vary amount of RAM, type of interconnect (say Infiniband or GigE), number of processors, number of bricks, amount of storage capacity per brick, type of disk (say Ultra320 SCSI or SATA II).. all depending upon your application needs.

What operating systems are supported by GlusterFS?

GlusterFS server can run on any POSIX compliant OS. GlusterFS client requires FUSE support in your kernel. As of now Linux, FreeBSD, OpenSolaris (work in progress) and Mac OS X kernels are known to support. We are also planning to introduce LD_PRELOAD'able GlusterFS client for non-FUSE compliant operating systems. Our GNU Hurd port will use the native translator interface rather than FUSE emulation hurdextras.

Linux users with the PaX memory protection kernel patches need to run

# paxctl -E /path/to/glusterfsd

before they use the server on pre-TLA 636 versions.

Post-TLA 636 (i.e. 1.3.8 and above), PaX users will need to run:

# paxctl -psmer /path/to/glusterfs
# paxctl -psmer /path/to/glusterfsd

Currently GlusterFS is successfully tested on:

  • GNU/Linux - (both client and server)
  • FreeBSD - (server)
  • OpenSolaris - (server)
  • MAC OS X - (both client and server) NOTE you need versions above 1.3.8pre1

Will GlusterFS be upward compatible?

Q: Since glusterfs is soo easy to scale, and after large deployments are setup, when new versions of gluster come out, will I have to bring down my glusterfs to upgrade, or will gluster work in a heterogenous environment where part of the servers can be upgraded to the newer version and some of the servers run older versions?

A: Its true that with newer versions, we are going to introduce new features. But, considering situations like this, we will be handling such situation (backward compatibility) but user may miss the new features altogether. Hence we recommend one to upgrade to newer versions. Anyways, we will announce the compatibility break in mailing list.

GlusterFS protocol has version checking mechanism, which checks for the compatibility of both server and client before making a successful connection.

NOTE: Its very much advised to use the same version of client and server.

Can GlusterFS store files redundantly?

GlusterFS automatic-file-replication translator does the job. NOTE: Earlier versions of AFR supported pattern based mirroring. With versions above 1.3.7, this can be achieved by switch scheduler in unify.

How to interpret GlusterFS versions?

MAJOR.MINOR.PATCH-LEVEL.

  • MAJOR changes when compatibility is broken. You can imagine NFSv2, NFSv3 or Ext2, Ext3 and Ext4 filesystems.
  • MINOR changes for every significant release and then goes into a maintenance phase.
  • PATCH-LEVEL changes within a maintenance phase for bug fixes or insignificant changes. If REVISION string contains RC or DOA, that release should not be used in production.

What is a DOA release?

DOA (Dead On Arrival) releases are only for developers and beta testers. We can only guarantee you for loss of data smiley-evil.png

What does BENKI mean?

Benki means fire in Kannada language. Entire 1.2.x series of GlusterFS has been codenamed BENKI. Every major series of Gluster will adopt an IRC nick name of one of our contributors. Benki is IRC nickname of Basavana Gowda KG.

What does SUSKE mean?

SUSKE is a comic character. Entire 1.3.x series of GlusterFS has been codenamed SUSKE. Every major series of Gluster will adopt an IRC nick name of one of our contributors. SUSKE is IRC nickname of Balamurugan.

Will a client for Windows be made?

Currently the support is through CIFS, (samba export). But we do have in mind about porting it natively using WinFUSE. The work will be taken once WinFUSE is stable

Can I add my question here?

If you do not find your question answered here and if you think it is a frequently asked question, you may add your question here. One of us will fill in the answer.

Can I edit these wiki pages?

You are most welcome to contribute to GlusterFS documentation.

Note: Anonymous editing of this wiki has been suspended after seeing wide spread vandalism on this site. So please create an account for yourself before editing these pages.

Also Refer

Personal tools