<div dir="ltr"><div>Any errors/warnings in the glustershd logs?<br><br></div>-Krutika<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Oct 1, 2016 at 8:18 PM, Lindsay Mathieson <span dir="ltr">&lt;<a href="mailto:lindsay.mathieson@gmail.com" target="_blank">lindsay.mathieson@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This was raised earlier but I don&#39;t believe it was ever resolved and it is becoming a serious issue for me.<br>

<br>

<br>

I&#39;m doing rolling upgrades on our three node cluster (Replica 3, Sharded, VM Workload).<br>

<br>

<br>

I update one node, reboot it, wait for healing to complete, do the next one.<br>

<br>

<br>

Only the heal count does not change, it just does not seem to start. It can take hours before it shifts, but once it does, its quite rapid. Node 1 has restarted and the heal count has been static at 511 shards for 45 minutes now. Nodes 1 &amp; 2 have low CPU load, node 3 has glusterfsd pegged at 800% CPU.<br>

<br>

<br>

This was *not* the case in earlier versions of gluster (3.7.11 I think), healing would start almost right away. I think it started doing this when the afr locking improvements where made.<br>

<br>

<br>

I have experimented with full &amp; diff heal modes, doesn&#39;t make any difference.<br>

<br>

Current:<br>

<br>

Gluster Version 4.8.4<br>

<br>

Volume Name: datastore4<br>

Type: Replicate<br>

Volume ID: 0ba131ef-311d-4bb1-be46-596e83<wbr>b2f6ce<br>

Status: Started<br>

Snapshot Count: 0<br>

Number of Bricks: 1 x 3 = 3<br>

Transport-type: tcp<br>

Bricks:<br>

Brick1: vnb.proxmox.softlog:/tank/vmda<wbr>ta/datastore4<br>

Brick2: vng.proxmox.softlog:/tank/vmda<wbr>ta/datastore4<br>

Brick3: vna.proxmox.softlog:/tank/vmda<wbr>ta/datastore4<br>

Options Reconfigured:<br>

cluster.self-heal-window-size: 1024<br>

cluster.locking-scheme: granular<br>

cluster.granular-entry-heal: on<br>

performance.readdir-ahead: on<br>

cluster.data-self-heal: on<br>

features.shard: on<br>

cluster.quorum-type: auto<br>

cluster.server-quorum-type: server<br>

nfs.disable: on<br>

nfs.addr-namelookup: off<br>

nfs.enable-ino32: off<br>

performance.strict-write-order<wbr>ing: off<br>

performance.stat-prefetch: on<br>

performance.quick-read: off<br>

performance.read-ahead: off<br>

performance.io-cache: off<br>

cluster.eager-lock: enable<br>

network.remote-dio: enable<br>

features.shard-block-size: 64MB<br>

cluster.background-self-heal-c<wbr>ount: 16<br>

<br>

<br>

Thanks,<span class="HOEnZb"><font color="#888888"><br>

<br>

<br>

<br>

<br>

<br>

-- <br>

Lindsay Mathieson<br>

<br>

______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman<wbr>/listinfo/gluster-users</a><br>

</font></span></blockquote></div><br></div>