<div dir="ltr">Thanks Atin.. I&#39;m not familiar with pulling patches the review system but will try:)<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 17, 2016 at 12:35 PM, Atin Mukherjee <span dir="ltr">&lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

<br>

On 06/16/2016 06:17 PM, Atin Mukherjee wrote:<br>

&gt;<br>

&gt;<br>

&gt; On 06/16/2016 01:32 PM, B.K.Raghuram wrote:<br>

&gt;&gt; Thanks a lot Atin,<br>

&gt;&gt;<br>

&gt;&gt; The problem is that we are using a forked version of 3.6.1 which has<br>

&gt;&gt; been modified to work with ZFS (for snapshots) but we do not have the<br>

&gt;&gt; resources to port that over to the later versions of gluster.<br>

&gt;&gt;<br>

&gt;&gt; Would you know of anyone who would be willing to take this on?!<br>

&gt;<br>

&gt; If you can cherry pick the patches and apply them on your source and<br>

&gt; rebuild it, I can point the patches to you, but you&#39;d need to give a<br>

&gt; day&#39;s time to me as I have some other items to finish from my plate.<br>

<br>

<br>

</span>Here is the list of the patches need to be applied on the following order:<br>

<br>

<a href="http://review.gluster.org/9328" rel="noreferrer" target="_blank">http://review.gluster.org/9328</a><br>

<a href="http://review.gluster.org/9393" rel="noreferrer" target="_blank">http://review.gluster.org/9393</a><br>

<a href="http://review.gluster.org/10023" rel="noreferrer" target="_blank">http://review.gluster.org/10023</a><br>

<div class="HOEnZb"><div class="h5"><br>

&gt;<br>

&gt; ~Atin<br>

&gt;&gt;<br>

&gt;&gt; Regards,<br>

&gt;&gt; -Ram<br>

&gt;&gt;<br>

&gt;&gt; On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a><br>

&gt;&gt; &lt;mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>&gt;&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;Â  Â  Â On 06/16/2016 10:49 AM, B.K.Raghuram wrote:<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt; On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a> &lt;mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>&gt;<br>

&gt;&gt;Â  Â  Â &gt; &lt;mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a> &lt;mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>&gt;&gt;&gt; wrote:<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â On 06/15/2016 04:24 PM, B.K.Raghuram wrote:<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; Hi,<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; We&#39;re using gluster 3.6.1 and we periodically find that gluster commands<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; fail saying the it could not get the lock on one of the brick machines.<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; The logs on that machine then say something like :<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; [2016-06-15 08:17:03.076119] E<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management: Unable to<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; acquire lock for vol2<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â This is a possible case if concurrent volume operations are run. Do you<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â have any script which checks for volume status on an interval from all<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â the nodes, if so then this is an expected behavior.<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt; Yes, I do have a couple of scripts that check on volume and quota<br>

&gt;&gt;Â  Â  Â &gt; status.. Given this, I do get a &quot;Another transaction is in progress..&quot;<br>

&gt;&gt;Â  Â  Â &gt; message which is ok. The problem is that sometimes I get the volume lock<br>

&gt;&gt;Â  Â  Â &gt; held message which never goes away. This sometimes results in glusterd<br>

&gt;&gt;Â  Â  Â &gt; consuming a lot of memory and CPU and the problem can only be fixed with<br>

&gt;&gt;Â  Â  Â &gt; a reboot. The log files are huge so I&#39;m not sure if its ok to attach<br>

&gt;&gt;Â  Â  Â &gt; them to an email.<br>

&gt;&gt;<br>

&gt;&gt;Â  Â  Â Ok, so this is known. We have fixed lots of stale lock issues in 3.7<br>

&gt;&gt;Â  Â  Â branch and some of them if not all were also backported to 3.6 branch.<br>

&gt;&gt;Â  Â  Â The issue is you are using 3.6.1 which is quite old. If you can upgrade<br>

&gt;&gt;Â  Â  Â to latest versions of 3.7 or at worst of 3.6 I am confident that this<br>

&gt;&gt;Â  Â  Â will go away.<br>

&gt;&gt;<br>

&gt;&gt;Â  Â  Â ~Atin<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; After sometime, glusterd then seems to give up and die..<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â Do you mean glusterd shuts down or segfaults, if so I am more<br>

&gt;&gt;Â  Â  Â interested<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â in analyzing this part. Could you provide us the glusterd log,<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â cmd_history log file along with core (in case of SEGV) from<br>

&gt;&gt;Â  Â  Â all the<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â nodes for the further analysis?<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt; There is no segfault. glusterd just shuts down. As I said above,<br>

&gt;&gt;Â  Â  Â &gt; sometimes this happens and sometimes it just continues to hog a lot of<br>

&gt;&gt;Â  Â  Â &gt; memory and CPU..<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; Interestingly, I also find the following line in the<br>

&gt;&gt;Â  Â  Â beginning of<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; etc-glusterfs-glusterd.vol.log and I dont know if this has any<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; significance to the issue :<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; [2016-06-14 06:48:57.282290] I<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; [glusterd-store.c:2063:glusterd_restore_op_version]<br>

&gt;&gt;Â  Â  Â 0-management:<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt; Detected new install. Setting op-version to maximum : 30600<br>

&gt;&gt;Â  Â  Â &gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt;<br>

&gt;&gt;Â  Â  Â &gt; What does this line signify?<br>

&gt;&gt;<br>

&gt;&gt;<br>

</div></div></blockquote></div><br></div>