<div dir="ltr">Aren&#39;t we are talking about this patch?<div><a href="https://git.proxmox.com/?p=pve-qemu-kvm.git;a=blob;f=debian/patches/gluster-backupserver.patch;h=ad241ee1154ebbd536d7c2c7987d86a02255aba2;hb=HEAD">https://git.proxmox.com/?p=pve-qemu-kvm.git;a=blob;f=debian/patches/gluster-backupserver.patch;h=ad241ee1154ebbd536d7c2c7987d86a02255aba2;hb=HEAD</a><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-10-26 22:56 GMT+02:00 Niels de Vos <span dir="ltr">&lt;<a href="mailto:ndevos@redhat.com" target="_blank">ndevos@redhat.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Thu, Oct 22, 2015 at 08:45:04PM +0200, André Bauer wrote:<br>

&gt; Hi,<br>

&gt;<br>

&gt; i have a 4 node Glusterfs 3.5.6 Cluster.<br>

&gt;<br>

&gt; My VM images are in an replicated distributed volume which is accessed<br>

&gt; from kvm/qemu via libgfapi.<br>

&gt;<br>

&gt; Mount is against storage.domain.local which has IPs for all 4 Gluster<br>

&gt; nodes set in DNS.<br>

&gt;<br>

&gt; When one of the Gluster nodes goes down (accidently reboot) a lot of the<br>

&gt; vms getting read only filesystem. Even when the node comes back up.<br>

&gt;<br>

&gt; How can i prevent this?<br>

&gt; I expect that the vm just uses the replicated file on the other node,<br>

&gt; without getting ro fs.<br>

&gt;<br>

&gt; Any hints?<br>

<br>

</span>There are at least two timeouts that are involved in this problem:<br>

<br>

1. The filesystem in a VM can go read-only when the virtual disk where<br>

   the filesystem is located does not respond for a while.<br>

<br>

2. When a storage server that holds a replica of the virtual disk<br>

   becomes unreachable, the Gluster client (qemu+libgfapi) waits for<br>

   max. network.ping-timeout seconds before it resumes I/O.<br>

<br>

Once a filesystem in a VM goes read-only, you might be able to fsck and<br>

re-mount it read-writable again. It is not something a VM will do by<br>

itself.<br>

<br>

<br>

The timeouts for (1) are set in sysfs:<br>

<br>

    $ cat /sys/block/sda/device/timeout<br>

    30<br>

<br>

30 seconds is the default for SD-devices, and for testing you can change<br>

it with an echo:<br>

<br>

    # echo 300 &gt; /sys/block/sda/device/timeout<br>

<br>

This is not a peristent change, you can create a udev-rule to apply this<br>

change at bootup.<br>

<br>

Some of the filesystem offer a mount option that can change the<br>

behaviour after a disk error is detected. &quot;man mount&quot; shows the &quot;errors&quot;<br>

option for ext*. Changing this to &quot;continue&quot; is not recommended, &quot;abort&quot;<br>

or &quot;panic&quot; will be the most safe for your data.<br>

<br>

<br>

The timeout mentioned in (2) is for the Gluster Volume, and checked by<br>

the client. When a client does a write to a replicated volume, the write<br>

needs to be acknowledged by both/all replicas. The client (libgfapi)<br>

delays the reply to the application (qemu) until both/all replies from<br>

the replicas has been received. This delay is configured as the volume<br>

option network.ping-timeout (42 seconds by default).<br>

<br>

<br>

Now, if the VM returns block errors after 30 seconds, and the client<br>

waits up to 42 seconds for recovery, there is an issue... So, your<br>

solution could be to increase the timeout for error detection of the<br>

disks inside the VMs, and/or decrease the network.ping-timeout.<br>

<br>

It would be interesting to know if adapting these values prevents the<br>

read-only occurrences in your environment. If you do any testing with<br>

this, please keep me informed about the results.<br>

<span class="HOEnZb"><font color="#888888"><br>

Niels<br>

</font></span><br>_______________________________________________<br>

Gluster-devel mailing list<br>

<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>

<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Best regards,<br>Roman.</div>

</div>