<div dir="ltr"><div>Hi,<br><br></div><div>Yeah, so the fuse mount log didn&#39;t convey much information.<br><br></div><div>So one of the reasons heal may have taken so long (and also consumed resources) is because of a bug in self-heal where it would do heal from both source bricks in 3-way replication. With such a bug, heal would take twice the amount of time and consume resources both the times by the same amount.<br><br></div><div>This issue is fixed at <a href="http://review.gluster.org/#/c/14008/" target="_blank">http://review.gluster.org/#/c/14008/</a> and will be available in 3.7.12.<br><br></div><div>The other thing you could do is to set cluster.data-self-heal-algorithm to &#39;full&#39;, for better heal performance and more regulated resource consumption by the same.<br></div><div> #gluster volume set &lt;VOL&gt; cluster.data-self-heal-algorithm full<br><br></div><div>As far as sharding is concerned, some critical caching issues were fixed in 3.7.7 and 3.7.8.<br></div><div>And my guess is that the vm crash/unbootable state could be because of this issue, which exists in 3.7.6.<br><br></div><div>3.7.10 saw the introduction of throttled client side heals which also moves such heals to the background, which is all the more helpful for preventing starvation of vms during client heal.<br></div><div><br></div><div>Considering these factors, I think it would be better if you upgraded your machines to 3.7.10.<br><br></div><div>Do let me know if migrating to 3.7.10 solves your issues.<br><br></div>-Krutika<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 18, 2016 at 12:40 PM, Kevin Lemonnier <span dir="ltr">&lt;<a href="mailto:lemonnierk@ulrar.net" target="_blank">lemonnierk@ulrar.net</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Yes, but as I was saying I don&#39;t believe KVM is using a mount point, I think it uses<br>

the API (<a href="http://www.gluster.org/community/documentation/index.php/Libgfapi_with_qemu_libvirt" rel="noreferrer" target="_blank">http://www.gluster.org/community/documentation/index.php/Libgfapi_with_qemu_libvirt</a>).<br>

Might be mistaken ofcourse. Proxmox does have a mountpoint for conveniance, I&#39;ll attach those<br>

logs, hoping they contain the informations you need. They do seem to contain a lot of errors<br>

for the 15.<br>

For reference, there was a disconnect of the first brick (10.10.0.1) in the morning and then a successfull<br>

heal that caused about 40 minutes downtime of the VMs. Right after that heal finished (if my memory is<br>

correct it was about noon or close) the second brick (10.10.0.2) rebooted, and that&#39;s the one I disconnected<br>

to prevent the heal from causing another downtime.<br>

I reconnected it one at the end of the afternoon, hoping the heal would go well but everything went down<br>

like in the morning so I disconnected it again, and waited 11pm (23:00) to reconnect it and let it finish.<br>

<br>

Thanks for your help,<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

On Mon, Apr 18, 2016 at 12:28:28PM +0530, Krutika Dhananjay wrote:<br>

&gt; Sorry, I was referring to the glusterfs client logs.<br>

&gt;<br>

&gt; Assuming you are using FUSE mount, your log file will be in<br>

&gt; /var/log/glusterfs/&lt;hyphenated-mount-point-path&gt;.log<br>

&gt;<br>

&gt; -Krutika<br>

&gt;<br>

&gt; On Sun, Apr 17, 2016 at 9:37 PM, Kevin Lemonnier &lt;<a href="mailto:lemonnierk@ulrar.net">lemonnierk@ulrar.net</a>&gt;<br>

&gt; wrote:<br>

&gt;<br>

&gt; &gt; I believe Proxmox is just an interface to KVM that uses the lib, so if I&#39;m<br>

&gt; &gt; not mistaken there isn&#39;t client logs ?<br>

&gt; &gt;<br>

&gt; &gt; It&#39;s not the first time I have the issue, it happens on every heal on the<br>

&gt; &gt; 2 clusters I have.<br>

&gt; &gt;<br>

&gt; &gt; I did let the heal finish that night and the VMs are working now, but it<br>

&gt; &gt; is pretty scarry for future crashes or brick replacement.<br>

&gt; &gt; Should I maybe lower the shard size ? Won&#39;t solve the fact that 2 bricks<br>

&gt; &gt; on 3 aren&#39;t keeping the filesystem usable but might make the healing<br>

&gt; &gt; quicker right ?<br>

&gt; &gt;<br>

&gt; &gt; Thanks<br>

&gt; &gt;<br>

&gt; &gt; Le 17 avril 2016 17:56:37 GMT+02:00, Krutika Dhananjay &lt;<br>

&gt; &gt; <a href="mailto:kdhananj@redhat.com">kdhananj@redhat.com</a>&gt; a écrit :<br>

&gt; &gt; &gt;Could you share the client logs and information about the approx<br>

&gt; &gt; &gt;time/day<br>

&gt; &gt; &gt;when you saw this issue?<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;-Krutika<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;On Sat, Apr 16, 2016 at 12:57 AM, Kevin Lemonnier<br>

&gt; &gt; &gt;&lt;<a href="mailto:lemonnierk@ulrar.net">lemonnierk@ulrar.net</a>&gt;<br>

&gt; &gt; &gt;wrote:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;&gt; Hi,<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; We have a small glusterFS 3.7.6 cluster with 3 nodes running with<br>

&gt; &gt; &gt;proxmox<br>

&gt; &gt; &gt;&gt; VM&#39;s on it. I did set up the different recommended option like the<br>

&gt; &gt; &gt;virt<br>

&gt; &gt; &gt;&gt; group, but<br>

&gt; &gt; &gt;&gt; by hand since it&#39;s on debian. The shards are 256MB, if that matters.<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; This morning the second node crashed, and as it came back up started<br>

&gt; &gt; &gt;a<br>

&gt; &gt; &gt;&gt; heal, but that basically froze all the VM&#39;s running on that volume.<br>

&gt; &gt; &gt;Since<br>

&gt; &gt; &gt;&gt; we really really<br>

&gt; &gt; &gt;&gt; can&#39;t have 40 minutes down time in the middle of the day, I just<br>

&gt; &gt; &gt;removed<br>

&gt; &gt; &gt;&gt; the node from the network and that stopped the heal, allowing the<br>

&gt; &gt; &gt;VM&#39;s to<br>

&gt; &gt; &gt;&gt; access<br>

&gt; &gt; &gt;&gt; their disks again. The plan was to re-connecte the node in a couple<br>

&gt; &gt; &gt;of<br>

&gt; &gt; &gt;&gt; hours to let it heal at night.<br>

&gt; &gt; &gt;&gt; But a VM crashed now, and it can&#39;t boot up again : seems to freez<br>

&gt; &gt; &gt;trying<br>

&gt; &gt; &gt;&gt; to access the disks.<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; Looking at the heal info for the volume, it has gone way up since<br>

&gt; &gt; &gt;this<br>

&gt; &gt; &gt;&gt; morning, it looks like the VM&#39;s aren&#39;t writing to both nodes, just<br>

&gt; &gt; &gt;the one<br>

&gt; &gt; &gt;&gt; they are on.<br>

&gt; &gt; &gt;&gt; It seems pretty bad, we have 2 nodes on 3 up, I would expect the<br>

&gt; &gt; &gt;volume to<br>

&gt; &gt; &gt;&gt; work just fine since it has quorum. What am I missing ?<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; It is still too early to start the heal, is there a way to start the<br>

&gt; &gt; &gt;VM<br>

&gt; &gt; &gt;&gt; anyway right now ? I mean, it was running a moment ago so the data is<br>

&gt; &gt; &gt;&gt; there, it just needs<br>

&gt; &gt; &gt;&gt; to let the VM access it.<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; Volume Name: vm-storage<br>

&gt; &gt; &gt;&gt; Type: Replicate<br>

&gt; &gt; &gt;&gt; Volume ID: a5b19324-f032-4136-aaac-5e9a4c88aaef<br>

&gt; &gt; &gt;&gt; Status: Started<br>

&gt; &gt; &gt;&gt; Number of Bricks: 1 x 3 = 3<br>

&gt; &gt; &gt;&gt; Transport-type: tcp<br>

&gt; &gt; &gt;&gt; Bricks:<br>

&gt; &gt; &gt;&gt; Brick1: first_node:/mnt/vg1-storage<br>

&gt; &gt; &gt;&gt; Brick2: second_node:/mnt/vg1-storage<br>

&gt; &gt; &gt;&gt; Brick3: third_node:/mnt/vg1-storage<br>

&gt; &gt; &gt;&gt; Options Reconfigured:<br>

&gt; &gt; &gt;&gt; cluster.quorum-type: auto<br>

&gt; &gt; &gt;&gt; cluster.server-quorum-type: server<br>

&gt; &gt; &gt;&gt; network.remote-dio: enable<br>

&gt; &gt; &gt;&gt; cluster.eager-lock: enable<br>

&gt; &gt; &gt;&gt; performance.readdir-ahead: on<br>

&gt; &gt; &gt;&gt; performance.quick-read: off<br>

&gt; &gt; &gt;&gt; performance.read-ahead: off<br>

&gt; &gt; &gt;&gt; performance.io-cache: off<br>

&gt; &gt; &gt;&gt; performance.stat-prefetch: off<br>

&gt; &gt; &gt;&gt; features.shard: on<br>

&gt; &gt; &gt;&gt; features.shard-block-size: 256MB<br>

&gt; &gt; &gt;&gt; cluster.server-quorum-ratio: 51%<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; Thanks for your help<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; --<br>

&gt; &gt; &gt;&gt; Kevin Lemonnier<br>

&gt; &gt; &gt;&gt; PGP Fingerprint : 89A5 2283 04A0 E6E9 0111<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; _______________________________________________<br>

&gt; &gt; &gt;&gt; Gluster-users mailing list<br>

&gt; &gt; &gt;&gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt; &gt; &gt;&gt; <a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt;<br>

&gt; &gt; --<br>

&gt; &gt; Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.<br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; Gluster-users mailing list<br>

&gt; &gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt; &gt; <a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

&gt; &gt;<br>

<br>

--<br>

Kevin Lemonnier<br>

PGP Fingerprint : 89A5 2283 04A0 E6E9 0111<br>

</div></div><br>_______________________________________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br></blockquote></div><br></div>