<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Oct 2, 2016 at 5:49 AM, Lindsay Mathieson <span dir="ltr">&lt;<a href="mailto:lindsay.mathieson@gmail.com" target="_blank">lindsay.mathieson@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 2/10/2016 12:48 AM, Lindsay Mathieson wrote:<br>

</span><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Only the heal count does not change, it just does not seem to start. It can take hours before it shifts, but once it does, its quite rapid. Node 1 has restarted and the heal count has been static at 511 shards for 45 minutes now. Nodes 1 &amp; 2 have low CPU load, node 3 has glusterfsd pegged at 800% CPU. <br>

</blockquote>

<br></span>

Ok, had a try at systematically reproducing it this morning and was actually unable to do so - quite weird. Testing was the same as last night - move all the VM&#39;s off a server and reboot it, wait for the healing to finish. This time I tried it with various different settings.<br>

<br>

<br>

Test 1<br>

------<br>

cluster.granular-entry-heal no<br>

cluster.locking-scheme full<br>

Shards / Min: 350 / 8<br>

<br>

<br>

Test 2<br>

------<br>

cluster.granular-entry-heal yes<br>

cluster.locking-scheme granular<br>

Shards / Min:  391 / 10<br>

<br>

Test 3<br>

------<br>

cluster.granular-entry-heal yes<br>

cluster.locking-scheme granular<br>

heal command issued<br>

Shards / Min: 358 / 11<br>

<br>

Test 3<br>

------<br>

cluster.granular-entry-heal yes<br>

cluster.locking-scheme granular<br>

heal full command issued<br>

Shards / Min: 358 / 27<br>

<br>

<br>

Best results were with cluster.granular-entry-heal=ye<wbr>s, cluster.locking-scheme=granula<wbr>r but they were all quite good.<br>

<br>

<br>

Don&#39;t know why it was so much worse last night - i/o load, cpu and memory were the same. However one thin that is different which I can&#39;t easily reproduce was that the cluster had been running for several weeks, but last night I rebooted all nodes. Could gluster be developing an issue after running for some time?</blockquote><div><br></div><div>From the algorithm point of view, the only thing that matters is load that it needs to heal. Doesn&#39;t depend on age. So whether the load to heal is 100GB in very less time or in few months, the time to heal should be same.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>

<br>

<br>

-- <br>

Lindsay Mathieson<br>

<br>

______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman<wbr>/listinfo/gluster-users</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Pranith<br></div></div>

</div></div>