<p dir="ltr">-Atin<br>
Sent from one plus one<br>
On Dec 17, 2015 3:21 PM, "Nicolas Ecarnot" <<a href="mailto:nicolas@ecarnot.net">nicolas@ecarnot.net</a>> wrote:<br>
><br>
> Le 17/12/2015 10:10, Nicolas Ecarnot a écrit :<br>
>><br>
>> Hello,<br>
>><br>
>> Our setup : 3 Centos 7.2 nodes, with gluster 3.7.6 in replica-3, used as<br>
>> storage+compute for an oVirt 3.5.6 DC.<br>
>><br>
>> Two days ago, we added some nagios/centreon monitoring watching every 5<br>
>> minutes the state of the heal queue :<br>
>> (something like "gluster volume heal some_vol info" with the adequate<br>
>> grep).<br>
>><br>
>> I expected the "Number of entries" of every node to appear in the graph<br>
>> as a flat zero line, most of the times, except for the rare cases of<br>
>> node reboot, after which healing is launched and takes some minutes<br>
>> (sometimes hours) but is doing good.<br>
>><br>
>> Instead, we see that the healing queue is doing 2 or 3 files healing say<br>
>> 4 times an hour. All day long.<br>
>><br>
>> Our DC is a small one, and has few VMs, so not more than only 8 big<br>
>> files are stored in glusterfs.<br>
>> I'm very surprised to see that these files constantly need healing, as I<br>
>> thought I've understood that read/writes were synchronous at every time,<br>
>> and replica-3 meant that every files were absolutely synced and commited<br>
>> at all time.<br>
>><br>
>> I've also read about the 10 minutes cron-like job of the self-healing<br>
>> daemon, which we are using by default, but this is a second point.<br>
>><br>
>> The first point leads to :<br>
>> - Why do we see so frequent desynchronizations between nodes?<br>
>> - Can I confirm that reading which logs?<br>
>> - What must I check?<br>
>><br>
><br>
> Self-replying, but as I found :<br>
> <a href="https://www.mail-archive.com/gluster-users%40gluster.org/msg20611.html">https://www.mail-archive.com/gluster-users%40gluster.org/msg20611.html</a><br>
><br>
> could this make sense to be surprised to see that :<br>
><br>
> gluster volume get data cluster.op-version<br>
> Option Value <br>
> ------ ----- <br>
> cluster.op-version 30600<br>
><br>
> in a 3.7.6 gluster cluster?<br>
That's normal as after upgrade an explicit op version bumping is required. In this case post 3.6 op version was never bumped up.<br>
><br>
> I have absolutely no idea of what this means nor how this changes anything. But I see many things in my logs like :<br>
><br>
> Server and Client lk-version numbers are not same, reopening the fds<br>
This has nothing to do with op-version and glusterd.<br>
><br>
> and<br>
><br>
> many many errors in etc-glusterfs-glusterd.vol.log about<br>
> missing options, other points like 'Unable to release lock', very frequent vol reqs :<br>
> <a href="http://pastebin.com/e6nQfeLx">http://pastebin.com/e6nQfeLx</a><br>
Again this is expected if concurrent commands on same volume is executed from different CLIs.<br>
><br>
> What is op-version used for?<br>
To be precise, it depicts what version the entire cluster can operate at.<br>
><br>
><br>
> -- <br>
> Nicolas ECARNOT<br>
> _______________________________________________<br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
> <a href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</p>