Gluster » glusterfs

The Gluster Blog

Gluster blog stories provide high-level spotlights on our users all over the world

2012-09-21

What is this new .glusterfs directory in 3.3?

Version 3.3 introduced a new structure to the bricks, the .glusterfs directory. So what is it?
The GFID
As you’re probably aware, GlusterFS stores metadata info in extended attributes. One of these bits of metadata is the “trusted.gfid”. This is, for a…

Gluster
2012-09-17

Howto: Using UFO (swift) — A Quick Setup Guide

This sets up a GlusterFS Unified File and Object (UFO) server on a single node (single brick) Gluster server using the RPMs contained in my YUM repo at http://repos.fedorapeople.org/repos/kkeithle/glusterfs/. This repo contains RPMs for Fedora 16, Fedora 17, and RHEL 6. Alternatively you may use the glusterfs-3.4.0beta1 RPMs from the GlusterFS YUM repo at http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.4.0beta1/ …Read more

Gluster
2012-08-31

GlusterFS replication do’s and don’ts

GlusterFS spreads load using a distribute hash translation (DHT) of filenames to it’s subvolumes. Those subvolumes are usually replicated to provide fault tolerance as well as some load handling. The advanced file replication translator (AFR) departs f…

Gluster
2012-08-15

GlusterFS bit by ext4 structure change

On Sunday, March 18th, Fan Yong commited a patch against ext4 to “return 32/64-bit dir name hash according to usage type”. Prior to that, ext2/3/4 would return a 32-bit hash value from telldir()/seekdir() as NFSv2 wasn’t designed to accomidate anything…

Gluster
2012-08-08

How I do vm management using Puppet, KVM, and GlusterFS on RHEL/CentOS

A GlusterFS user from IRC asked me about my puppet management of KVM in RHEL/CentOS and how it works. I started to write this post two weeks ago and had to stop because although it works great, I figured that wasn’t the answer he was looking for. I loo…

Gluster
2012-07-19

Fixing split-brain with GlusterFS 3.3

With the addition of automated self-heal in GlusterFS 3.3, a new hidden directory structure was added to each brick: “.glusterfs”. This complicates split-brain resolution as you now not only have to remove the “bad” file from the brick, but it’s counte…

Gluster
2012-06-20

GlusterFS: {path} or a prefix of it is already part of a volume

Starting with GlusterFS 3.3, one change has been the check to see if a directory (or any of it’s ancestors) is already part of a volume. This is causing many support questions in #gluster.
This was implemented because if you remove a brick from a volum…

Gluster
2012-05-31

Gluster Community releases next generation of GlusterFS
The release of GlusterFS 3.3.0 by the Gluster Community marks a major milestone in Clustered File Storage. GlusterFS is the leading open source solution for the dramatically increasing volume of unstructured data. It is a software-only, highly available, scale-out, centrally managed storage pool that can be backed by any POSIX filesystem that supports extended attributes, such as Ext3/4, XFS, BTRFS and many more.

As an example of Red Hat’s goal of building strong, independent, open source communities, GlusterFS 3.3.0 marks the first release as an “upstream” project with its own release schedule. This release addresses many of the most commonly requested features including proactive self-healing, quorum enforcement, and granular locking for self-healing, as well as many more bug fixes and enhancements.

Some of the more noteworthy features include:
- Unified File and Object storage – Blending OpenStack’s Object Storage API with GlusterFS provides simultaneous read and write access to data as files or as objects.
- HDFS compatibility – Gives Hadoop admins the ability to run map/reduce jobs on unstructured data on GlusterFS and access the data with well-known tools and shell scripts.
- Proactive self-healing – GlusterFS volumes will now automatically restore file integrity after a replica recovers from failure.
- Granular locking – Allows large files to be accessed even during self-healing, a feature that is particularly important for VM images.
- Replication improvements – with quorum enforcement you are assured that your data has been written in at least the configured number of places before the file operation returns, allowing a user-configurable adjustment to fault tolerance vs performance.
Visit Gluster.org to download. Packages are available for most distributions, including Ubuntu, Debian, Fedora, RHEL, and CentOS.

Get involved! Join us on #gluster on freenode, join our mailing list, ‘like’ our Facebook page, follow us on twitter, or check out our LinkedIn group.

GlusterFS is an open source project sponsored by Red Hat®, who uses it in its line of Red Hat Storage products.
Gluster
2012-05-15

Broken 32bit apps on GlusterFS
The Problem

Over the last couple days, in #gluster, users have come in complaining that their application can’t open a file, but that if they try accessing the file from the shell as the same user, it works fine. This was reported with apache’s tomcat and mod_fcgid and courier imap.

My first thought on this, and it still would be, is selinux. Selinux’s role is to prevent the wrong thing from doing what it’s not expected to do. It will make some applications be unable to even access a file that every other test proves should work. Always check this first if you’re experiencing unexpected access issues.

But in this case, it turned out to be the application itself. The users were running 32 bit apps on 64 bit platforms. As it turns out, the applications were tracking the inode numbers of files. They would call stat() which would return a 64bit inode. The apache programs would copy the results of that stat call into it’s own structure. If the apps were built on 32bit platforms, apache’s struct would have a 32 bit field. The 64 bit result wouldn’t fit. Apache tested for that and would error out with the ambiguous error message:
```
Syntax error on line ## of {filename}
Wrapper  {filename} cannot be accessed: (70008)Partial results are valid but processing is incomplete
```
What it really meant was that the 64 bit inode overflowed the 32 bit field it allocated for storing it.

To identify this is the problem, run
```
file $FILENAME
```
where $FILENAME is the binary that’s producing the error. If it contains “32-bit” then that’s a pretty good indication that this might be a problem.

The Solution

The best solution, of course, is to use 64 bit applications on your 64 bit clients. This wasn’t possible for this user.

To solve this problem, we configured the volume to enable the 32bit inode workaround in it
```
gluster volume set $VOLUME nfs.enable-ino32 on
```
Then we mounted via nfs instead of using the fuse client.

This passed 32 bit inode translations to the application, eliminating the overflow. This worked for the apache programs, but not the 32 bit courier imap. GlusterFS 3.2 doesn’t support nfs locks. Since courier requires those, this wouldn’t work.

What about redundancy?

Redundancy was maintained by installing the server package for GlusterFS on the client, starting glusterd, and adding the client to the peer group for the volume. This starts the nfs daemon on the client, allowing the client to do an nfs mount from localhost. The nfs daemon then handles connecting to the brick servers, maintaining redundancy.
Gluster
2012-05-10

GlusterFS and lstat
In a very unscientific test, I was curious about how much of an effect GlusterFS’ self-heal check has on lstat. I wrote probably the first C program I’ve written in 20 years to find out.

To my local disk, which is not the same type or speed as my bricks (although it shouldn’t matter as this should all be handled in cache anyway), to a raw image from within a KVM instance, and to a file on a fuse mounted gluster volume; I looped lstat calls for 60 seconds. This was the result:

Iterations

Calculated Latency

Store

90330916 0.66 microseconds Local

56497255 1.06 microseconds Raw VM Image

32860989 1.83 microseconds GlusterFS

Again, this is probably the worst test I could do, it’s not at all scientific, has way too many differences in the tests, is performed on a replica 3 volume with a replica down, is run on 3.1.7 (for which afr should perform the same as 3.2.6) and is just overall a waste of blog space, imho, but who knows. Someone else might at least get inspired to do a real test.

Result

As you can see, it’s pretty significant. An almost 64% latency hit for this dumb test over local which, really, should be expected considering we’re adding network latency on top of everything, but the 41% drop from VM Image to GlusterFS mount probably a smidgeon more accurately represent the latency hit for the self-heal checks.

Here’s the C source:
```
#include <sys/types.h>
#include <sys/stat.h>
#include <time.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int
main (int argc, char *argv[]) {
    struct stat sb;
    time_t seconds;
    uint64_t count;


    if (argc != 2) {
        fprintf(stderr, "Usage: $s <pathname>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    if (lstat(argv[1], &sb) == -1) {
        perror("stat");
        exit(EXIT_FAILURE);
    }

    seconds = time(NULL);
    count = 0;

    while ( seconds + 60 > time(NULL) ) {
        lstat(argv[1], &sb);
        count++;
    }

    fprintf(stdout, "Performed %llu lstat() calls in 60 seconds.\n", count);
}
```
Gluster
2012-04-24

Optimizing web performance with GlusterFS

More often than I would like, someone with twenty or more web servers servicing tens of thousands of page hits per hour comes into #gluster asking how to get the highest performance out of their storage system. They’ve only just now come to the realiza…

Gluster
2012-04-18

Mounting a GlusterFS volume as an unprivileged user

Since GlusterFS is fuse based, it can be mounted as a standard user without too much difficulty.
On a server:
gluster volume set $VOLUME allow-insecure on
On the client as root:
echo user_allow_other >> /etc/fuse.conf
To mount the volume, you…

Gluster
2012-04-17

Should I use Stripe on GlusterFS?

Frequently I have new users come into #gluster with their first ever GlusterFS volume being a stripe volume. Why? Because they’re sure that’s the right way to get better performance.
That ain’t necessarily so. The stripe translator was designed to allo…

Gluster
2012-01-18

Wireshark decoder for GlusterFS/Redhat Storage

Nixpanic has created a wireshark decoder for GlusterFS/Redhat Storage. This should help immensely in debugging and tuning!

Gluster
2011-09-06

Quick and dirty python script to check the dirty status of files in a GlusterFS brick

This is a quick and dirty script I threw together to list files with dirty flags from a GlusterFS brick.
#!/bin/env python
#
# (C) 2011, Joe Julian
#
# License: GPLv2 http://www.gnu.org/licenses/gpl-2.0.html
#

import os,socket,xattr,sys,time
from sta…

Gluster
2011-03-27

Finding Gluster volumes from a client machine

One of the questions that I come across in IRC and other places often is how to obtain a list of gluster volumes that can be mounted from a client machine. NFS provides showmount which helps in figuring out the list of exports from a server amongst other things. GlusterFS currently does not have an […]

Gluster

« Previous 1 … 3 4 5

Copyright © 2019, Red Hat, Inc. All rights reserved.