<p dir="ltr">Could you check the glusterd log at the other nodes, that would give you the hint of the exact issue. Also looking at .cmd_log_history will give you the time interval at which volume status commands are executed. If the gap is in milisecs then you are bound to hit it and its expected.<br></p>

<p dir="ltr">-Atin<br>

Sent from one plus one</p>

<div class="gmail_quote">On Aug 3, 2015 7:32 PM, &quot;Osborne, Paul (<a href="mailto:paul.osborne@canterbury.ac.uk">paul.osborne@canterbury.ac.uk</a>)&quot; &lt;<a href="mailto:paul.osborne@canterbury.ac.uk">paul.osborne@canterbury.ac.uk</a>&gt; wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

Hi,<br>

<br>

Last week I upgraded one of my gluster clusters (3 hosts with bricks as replica 3) to 3.6.4 from 3.5.4 and all seemed well.<br>

<br>

Today I am getting reports that locking has failed:<br>

<br>

<br>

gfse-cant-01:/var/log/glusterfs# gluster volume status<br>

Locking failed on <a href="http://gfse-rh-01.core.canterbury.ac.uk" rel="noreferrer" target="_blank">gfse-rh-01.core.canterbury.ac.uk</a>. Please check log file for details.<br>

Locking failed on <a href="http://gfse-isr-01.core.canterbury.ac.uk" rel="noreferrer" target="_blank">gfse-isr-01.core.canterbury.ac.uk</a>. Please check log file for details.<br>

<br>

Logs:<br>

[2015-08-03 13:45:29.974560] E [glusterd-syncop.c:1640:gd_sync_task_begin] 0-management: Locking Peers Failed.<br>

[2015-08-03 13:49:48.273159] E [glusterd-syncop.c:105:gd_collate_errors] 0-: Locking failed on <a href="http://gfse-rh-01.core.canterbury.ac.uk" rel="noreferrer" target="_blank">gfse-rh-01.core.canterbury.ac.uk</a>. Please ch<br>

eck log file for details.<br>

[2015-08-03 13:49:48.273778] E [glusterd-syncop.c:105:gd_collate_errors] 0-: Locking failed on <a href="http://gfse-isr-01.core.canterbury.ac.uk" rel="noreferrer" target="_blank">gfse-isr-01.core.canterbury.ac.uk</a>. Please c<br>

heck log file for details.<br>

<br>

<br>

I am wondering if this is a new feature due to 3.6.4 or something that has gone wrong.<br>

<br>

Restarting gluster entirely (btw the restart script does not actually appear to kill the processes...) resolves the issue but then it repeats a few minutes later which is rather suboptimal for a running service.<br>

<br>

Googling suggests that there may be simultaneous actions going on that can cause a locking issue.<br>

<br>

I know that I have nagios running volume status &lt;volname&gt; for each of my volumes on each host every few minutes however this is not new and has been in place for the last 8-9 months that against 3.5 without issue so would hope that this is not causing the issue.<br>

<br>

I am not sure where to look now tbh.<br>

<br>

<br>

<br>

<br>

Paul Osborne<br>

Senior Systems Engineer<br>

Canterbury Christ Church University<br>

Tel: 01227 782751<br>

_______________________________________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

</blockquote></div>