[Gluster-users] [Gluster-devel] VM fs becomes read only when one gluster node goes down

Mon Nov 2 22:48:49 UTC 2015

I wouldn't think you'd need any 'arbiter' nodes (in quotes because in 3.7+
there is an actual arbiter node at the volume level). You have 4 nodes, and
if you lose 1, you're at 3/4 or 75%.

Personally I've not had much luck with 2 node (with or without the fake
arbiter node) as storage for Ovirt VM's. I found I ran into a slew of
storage domain failure issues (no data loss), hanging VM's etc. Instead I
went with a replica 3 volume for just VM storage (3 x 1TB SSD's) and bulk
storage is distributed replica 2.

I found when a node in a replica pair goes down and is timing out there was
zero IO (no read, no write). After a timeout I end up with a readonly
filesystem for whatever data was stored on that replica pair. Not very
useful for something stateful like a VM. The only way to get write access
was to get the failed node up and running, and usually the VM's in Ovirt
ended up in a 'paused' state that couldn't be recovered from.

I also testing volume level arbiter (replica 2 arbiter 1) with gluster
3.7.3 before going to 3.6.6 and replica 3, and found IO was too slow for my
environment. The bug report I filed is here for some write speed
references: https://bugzilla.redhat.com/show_bug.cgi?id=1255110

In any case I'd stick with a stable release of Gluster, and try to get
replica 3 for VM storage if you can.

On Mon, Nov 2, 2015 at 9:54 AM, André Bauer <abauer at magix.net> wrote:

> Thanks for the hints guys :-)
>
> I think i will try to use an arbiter. As i use distributed / replicated
> volumes i think i have to add 2 arbiters, right?
>
> My nodes have 10GBit interfaces. Would be 1 GBit for the arbiter(s) enough?
>
> Regards
> André
>
>
> Am 28.10.2015 um 14:38 schrieb Diego Remolina:
> > I am running Ovirt and self-hosted engine with additional vms on a
> > replica two gluster volume. I have an "arbiter" node and set quorum
> > ratio to 51%. The arbiter node is just another machine with the
> > glusterfs bits installed that is part of the gluster peers but has no
> > bricks to it.
> >
> > You will have to be very careful where you put these three machines if
> > they are going to go in separate server rooms or buildings. There are
> > pros and cons to distribution of the nodes and network topology may
> > also influence that.
> >
> > In my case, this is on a campus, I have machines in 3 separate
> > buildings and all machines are on the same main campus router (we have
> > more than one main router). All machines connected via 10 gbps. If I
> > had one node with bricks and the arbiter in the same building and that
> > building went down (power/AC/chill water/network), then the other node
> > with bricks would be useless. This is why I have machines in 3
> > different buildings. Oh, and this is because most of the client
> > systems are not even in the same building as the servers. If my client
> > machines and servers where in the same building, then doing one node
> > with bricks and arbiter in that same building could make sense.
> >
> > HTH,
> >
> > Diego
> >
> >
> >
> >
> > On Wed, Oct 28, 2015 at 5:25 AM, Niels de Vos <ndevos at redhat.com> wrote:
> >> On Tue, Oct 27, 2015 at 07:21:35PM +0100, André Bauer wrote:
> >>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> Hash: SHA256
> >>>
> >>> Hi Niels,
> >>>
> >>> my network.ping-timeout was already set to 5 seconds.
> >>>
> >>> Unfortunately it seems i dont have the timout setting in Ubuntu 14.04
> >>> for my vda disk.
> >>>
> >>> ls -al /sys/block/vda/device/ gives me only:
> >>>
> >>> drwxr-xr-x 4 root root    0 Oct 26 20:21 ./
> >>> drwxr-xr-x 5 root root    0 Oct 26 20:21 ../
> >>> drwxr-xr-x 3 root root    0 Oct 26 20:21 block/
> >>> - -r--r--r-- 1 root root 4096 Oct 27 18:13 device
> >>> lrwxrwxrwx 1 root root    0 Oct 27 18:13 driver ->
> >>> ../../../../bus/virtio/drivers/virtio_blk/
> >>> - -r--r--r-- 1 root root 4096 Oct 27 18:13 features
> >>> - -r--r--r-- 1 root root 4096 Oct 27 18:13 modalias
> >>> drwxr-xr-x 2 root root    0 Oct 27 18:13 power/
> >>> - -r--r--r-- 1 root root 4096 Oct 27 18:13 status
> >>> lrwxrwxrwx 1 root root    0 Oct 26 20:21 subsystem ->
> >>> ../../../../bus/virtio/
> >>> - -rw-r--r-- 1 root root 4096 Oct 26 20:21 uevent
> >>> - -r--r--r-- 1 root root 4096 Oct 26 20:21 vendor
> >>>
> >>>
> >>> Is the qourum setting a problem, if you only have 2 replicas?
> >>>
> >>> My volume has this quorum options set:
> >>>
> >>> cluster.quorum-type: auto
> >>> cluster.server-quorum-type: server
> >>>
> >>> As i understand the documentation (
> >>>
> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.0/html/A
> >>> dministration_Guide/sect-User_Guide-Managing_Volumes-Quorum.html
> >>> ), cluster.server-quorum-ratio is set to "< 50%" by default, which can
> >>> never happen if you only have 2 replicas and one node goes down, right?
> >>>
> >>> Do in need cluster.server-quorum-ratio = 50% in this case?
> >>
> >> Replica 2 for VM storage is troublesome. Sahine just responded very
> >> nicely to a very similar email:
> >>
> >>
> http://thread.gmane.org/gmane.comp.file-systems.gluster.user/22818/focus=22823
> >>
> >> HTH,
> >> Niels
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >
>
>
> --
> Mit freundlichen Grüßen
> André Bauer
>
> MAGIX Software GmbH
> André Bauer
> Administrator
> August-Bebel-Straße 48
> 01219 Dresden
> GERMANY
>
> tel.: 0351 41884875
> e-mail: abauer at magix.net
> abauer at magix.net <mailto:Email>
> www.magix.com <http://www.magix.com/>
>
> Geschäftsführer | Managing Directors: Dr. Arnd Schröder, Klaus Schmidt
> Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205
>
> Find us on:
>
> <http://www.facebook.com/MAGIX> <http://www.twitter.com/magix_de>
> <http://www.youtube.com/wwwmagixcom> <http://www.magixmagazin.de>
> ----------------------------------------------------------------------
> The information in this email is intended only for the addressee named
> above. Access to this email by anyone else is unauthorized. If you are
> not the intended recipient of this message any disclosure, copying,
> distribution or any action taken in reliance on it is prohibited and
> may be unlawful. MAGIX does not warrant that any attachments are free
> from viruses or other defects and accepts no liability for any losses
> resulting from infected email transmissions. Please note that any
> views expressed in this email may be those of the originator and do
> not necessarily represent the agenda of the company.
> ----------------------------------------------------------------------
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151102/a3508243/attachment.html>