<html><body><div style="font-family: garamond,new york,times,serif; font-size: 12pt; color: #000000"><div>Hi Dominique,<br></div><div><br></div><div><em></em>I saw the logs attached. At some point all bricks seem to have gone down as I see<br></div><div>[2016-01-31 16:17:20.907680] E [MSGID: 108006] [afr-common.c:3999:afr_notify] 0-cluster1-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.</div><div>in the client logs.<br></div><div><br></div><div>This *may* have been the reason for the VMs going offline.<br></div><div><br></div><div>Also, Steve's inputs are correct wrt the distinction between server quorum and client quorum. Usually it is recommended that you do the following things while using Gluster for VM store use case:<br></div><div><br></div><div>i) use replica 3 (as opposed to replica 2) volume. In your case the third node should also be used to host a brick of the volume.<br></div><div>You can use arbiter feature if you want to minimise the cost of investing in three machines.<br></div><div>Check this out: <a href="https://gluster.readthedocs.org/en/release-3.7.0/Features/afr-arbiter-volumes/">https://gluster.readthedocs.org/en/release-3.7.0/Features/afr-arbiter-volumes/</a><br></div><div><br>Also if you plan to use arbiter, it is recommended that you do so with glusterfs-3.7.8 as it contains some critical bug fixes.<br></div><div><br></div><div>ii) Once you're done with 1), enable group virt option on the volume:<br></div><div># gluster volume set <VOLNAME> group virt<br></div><div>which will initialise the volume configuration specifically meant to be used for VM store use case (including initialisation of the right quorum options) in one step.<br></div><div><br></div><div>iii) have you tried sharding yet? If not, you could give that a try too. It has been found to be useful for VM store workload.<br></div><div>Check this out: <a href="http://blog.gluster.org/2015/12/introducing-shard-translator/">http://blog.gluster.org/2015/12/introducing-shard-translator/</a><br></div><div><br></div><div>Let me know if this works for you.<br></div><div><br></div><div>-Krutika<br></div><div> <br></div><div><br></div><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Steve Dainard" <sdainard@spd1.com><br><b>To: </b>"Dominique Roux" <dominique.roux@ungleich.ch><br><b>Cc: </b>"gluster-users@gluster.org List" <gluster-users@gluster.org><br><b>Sent: </b>Thursday, February 11, 2016 3:52:18 AM<br><b>Subject: </b>Re: [Gluster-users] Fail of one brick lead to crash VMs<br><div><br></div>For what it's worth, I've never been able to lose a brick in a 2 brick<br>replica volume and still be able to write data.<br><div><br></div>I've also found the documentation confusing as to what 'Option:<br>cluster.server-quorum-type' actually means.<br>Default Value: (null)<br>Description: This feature is on the server-side i.e. in glusterd.<br>Whenever the glusterd on a machine observes that the quorum is not<br>met, it brings down the bricks to prevent data split-brains. When the<br>network connections are brought back up and the quorum is restored the<br>bricks in the volume are brought back up.<br><div><br></div>It seems to be implying a brick quorum, but I think it actually means<br>a glusterd quorum. In other words, if 2/3 glusterd processes fail,<br>take the brick offline. This would seem to make sense in your<br>configuration.<br><div><br></div>But<br><div><br></div>There are also two other quorum settings which seem to be more focused<br>on brick count/ratio to form quorum:<br><div><br></div>Option: cluster.quorum-type<br>Default Value: none<br>Description: If value is "fixed" only allow writes if quorum-count<br>bricks are present. If value is "auto" only allow writes if more than<br>ha<br>lf of bricks, or exactly half including the first, are present.<br><div><br></div>Option: cluster.quorum-count<br>Default Value: (null)<br>Description: If quorum-type is "fixed" only allow writes if this many<br>bricks or present. Other quorum types will OVERWRITE this value.<br><div><br></div>So you might be able to set type as 'fixed' and count as '1' and with<br>cluster.server-quorum-type: server<br>already enabled get what you want.<br><div><br></div>But again, I've never had this work properly, and always ended up with<br>split-brains which are difficult to resolve when you're storing vm<br>images rather than files.<br><div><br></div>Your other options are; use your 3rd server as another brick, and do<br>replica 3 (which I've had good success with).<br><div><br></div>Or seeing as you're using 3.7 you could look into arbiter nodes if<br>they're stable in current version.<br><div><br></div><br>On Mon, Feb 8, 2016 at 6:20 AM, Dominique Roux<br><dominique.roux@ungleich.ch> wrote:<br>> Hi guys,<br>><br>> I faced a problem a week ago.<br>> In our environment we have three servers in a quorum. The gluster volume<br>> is spreaded over two bricks and has the type replicated.<br>><br>> We now, for simulating a fail of one brick, isolated one of the two<br>> bricks with iptables, so that communication to the other two peers<br>> wasn't possible anymore.<br>> After that VMs (opennebula) which had I/O in this time crashed.<br>> We stopped the glusterfsd hard (kill -9) and restarted it, what made<br>> things work again (Certainly we also had to restart the failed VMs). But<br>> I think this shouldn't happen. Since quorum was not reached (2/3 hosts<br>> were still up and connected).<br>><br>> Here some infos of our system:<br>> OS: CentOS Linux release 7.1.1503<br>> Glusterfs version: glusterfs 3.7.3<br>><br>> gluster volume info:<br>><br>> Volume Name: cluster1<br>> Type: Replicate<br>> Volume ID:<br>> Status: Started<br>> Number of Bricks: 1 x 2 = 2<br>> Transport-type: tcp<br>> Bricks:<br>> Brick1: srv01:/home/gluster<br>> Brick2: srv02:/home/gluster<br>> Options Reconfigured:<br>> cluster.self-heal-daemon: enable<br>> cluster.server-quorum-type: server<br>> network.remote-dio: enable<br>> cluster.eager-lock: enable<br>> performance.stat-prefetch: on<br>> performance.io-cache: off<br>> performance.read-ahead: off<br>> performance.quick-read: off<br>> server.allow-insecure: on<br>> nfs.disable: 1<br>><br>> Hope you can help us.<br>><br>> Thanks a lot.<br>><br>> Best regards<br>> Dominique<br>> _______________________________________________<br>> Gluster-users mailing list<br>> Gluster-users@gluster.org<br>> http://www.gluster.org/mailman/listinfo/gluster-users<br>_______________________________________________<br>Gluster-users mailing list<br>Gluster-users@gluster.org<br>http://www.gluster.org/mailman/listinfo/gluster-users<br></blockquote><div><br></div></div></body></html>