<html><body><div style="font-family: times new roman,new york,times,serif; font-size: 12pt; color: #000000"><div>Hi,<br></div><div><br></div><div>Have posted a fix for hang in read : <a href="http://review.gluster.org/15901" data-mce-href="http://review.gluster.org/15901">http://review.gluster.org/15901</a><br></div><div>I think, it will fix the issue reported here. Please check the commit message of the patch<br></div><div>for more details.<br></div><div><br></div><div>Regards,<br></div><div>Poornima<br></div><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Nithya Balachandran" &lt;nbalacha@redhat.com&gt;<br><b>To: </b>"Raghavendra Gowdappa" &lt;rgowdapp@redhat.com&gt;<br><b>Cc: </b>"Gluster Devel" &lt;gluster-devel@gluster.org&gt;<br><b>Sent: </b>Tuesday, November 22, 2016 3:23:59 AM<br><b>Subject: </b>Re: [Gluster-devel] Upstream smoke test failures<br><div><br></div><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On 22 November 2016 at 13:09, Raghavendra Gowdappa <span dir="ltr">&lt;<a href="mailto:rgowdapp@redhat.com" target="_blank" data-mce-href="mailto:rgowdapp@redhat.com">rgowdapp@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" data-mce-style="margin: 0px 0px 0px 0.8ex; border-left: 1px solid #cccccc; padding-left: 1ex;"><div class="gmail-HOEnZb"><div class="gmail-h5"><br> <br> ----- Original Message -----<br> &gt; From: "Vijay Bellur" &lt;<a href="mailto:vbellur@redhat.com" target="_blank" data-mce-href="mailto:vbellur@redhat.com">vbellur@redhat.com</a>&gt;<br> &gt; To: "Nithya Balachandran" &lt;<a href="mailto:nbalacha@redhat.com" target="_blank" data-mce-href="mailto:nbalacha@redhat.com">nbalacha@redhat.com</a>&gt;<br> &gt; Cc: "Gluster Devel" &lt;<a href="mailto:gluster-devel@gluster.org" target="_blank" data-mce-href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>&gt;<br> &gt; Sent: Wednesday, November 16, 2016 9:41:12 AM<br> &gt; Subject: Re: [Gluster-devel] Upstream smoke test failures<br> &gt;<br> &gt; On Tue, Nov 15, 2016 at 8:40 AM, Nithya Balachandran<br> &gt; &lt;<a href="mailto:nbalacha@redhat.com" target="_blank" data-mce-href="mailto:nbalacha@redhat.com">nbalacha@redhat.com</a>&gt; wrote:<br> &gt; &gt;<br> &gt; &gt;<br> &gt; &gt; On 15 November 2016 at 18:55, Vijay Bellur &lt;<a href="mailto:vbellur@redhat.com" target="_blank" data-mce-href="mailto:vbellur@redhat.com">vbellur@redhat.com</a>&gt; wrote:<br> &gt; &gt;&gt;<br> &gt; &gt;&gt; On Mon, Nov 14, 2016 at 10:34 PM, Nithya Balachandran<br> &gt; &gt;&gt; &lt;<a href="mailto:nbalacha@redhat.com" target="_blank" data-mce-href="mailto:nbalacha@redhat.com">nbalacha@redhat.com</a>&gt; wrote:<br> &gt; &gt;&gt; &gt;<br> &gt; &gt;&gt; &gt;<br> &gt; &gt;&gt; &gt; On 14 November 2016 at 21:38, Vijay Bellur &lt;<a href="mailto:vbellur@redhat.com" target="_blank" data-mce-href="mailto:vbellur@redhat.com">vbellur@redhat.com</a>&gt; wrote:<br> &gt; &gt;&gt; &gt;&gt;<br> &gt; &gt;&gt; &gt;&gt; I would prefer that we disable dbench only if we have an owner for<br> &gt; &gt;&gt; &gt;&gt; fixing the problem and re-enabling it as part of smoke tests. Running<br> &gt; &gt;&gt; &gt;&gt; dbench seamlessly on gluster has worked for a long while and if it is<br> &gt; &gt;&gt; &gt;&gt; failing today, we need to address this regression asap.<br> &gt; &gt;&gt; &gt;&gt;<br> &gt; &gt;&gt; &gt;&gt; Does anybody have more context or clues on why dbench is failing now?<br> &gt; &gt;&gt; &gt;&gt;<br> &gt; &gt;&gt; &gt; While I agree that it needs to be looked at asap, leaving it in until we<br> &gt; &gt;&gt; &gt; get<br> &gt; &gt;&gt; &gt; an owner seems rather pointless as all it does is hold up various<br> &gt; &gt;&gt; &gt; patches<br> &gt; &gt;&gt; &gt; and waste machine time. Re-triggering it multiple times so that it<br> &gt; &gt;&gt; &gt; eventually passes does not add anything to the regression test processes<br> &gt; &gt;&gt; &gt; or<br> &gt; &gt;&gt; &gt; validate the patch as we know there is a problem.<br> &gt; &gt;&gt; &gt;<br> &gt; &gt;&gt; &gt; I would vote for removing it and assigning someone to look at it<br> &gt; &gt;&gt; &gt; immediately.<br> &gt; &gt;&gt; &gt;<br> &gt; &gt;&gt;<br> &gt; &gt;&gt; From the debugging done so far can we identify an owner to whom this<br> &gt; &gt;&gt; can be assigned? I looked around for related discussions and could<br> &gt; &gt;&gt; figure out that we are looking to get statedumps. Do we have more<br> &gt; &gt;&gt; information/context beyond this?<br> &gt; &gt;&gt;<br> &gt; &gt; I have updated the BZ (<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1379228" rel="noreferrer" target="_blank" data-mce-href="https://bugzilla.redhat.com/show_bug.cgi?id=1379228">https://bugzilla.redhat.com/show_bug.cgi?id=1379228</a>)<br> &gt; &gt; with info from the last failure - looks like hangs in write-behind and<br> &gt; &gt; read-ahead.<br> &gt; &gt;<br> &gt;<br> &gt;<br> &gt; I spent some time on this today and it does look like write-behind is<br> &gt; absorbing READs without performing any WIND/UNWIND actions. I have<br> &gt; attached a statedump from a slave that had the dbench problem (thanks,<br> &gt; Nigel!) to the above bug.<br> &gt;<br> &gt; Snip from statedump:<br> &gt;<br> &gt; [global.callpool.stack.2]<br> &gt; stack=0x7fd970002cdc<br> &gt; uid=0<br> &gt; gid=0<br> &gt; pid=31884<br> &gt; unique=37870<br> &gt; lk-owner=0000000000000000<br> &gt; op=READ<br> &gt; type=1<br> &gt; cnt=2<br> &gt;<br> &gt; [global.callpool.stack.2.frame.1]<br> &gt; frame=0x7fd9700036ac<br> &gt; ref_count=0<br> &gt; translator=patchy-read-ahead<br> &gt; complete=0<br> &gt; parent=patchy-readdir-ahead<br> &gt; wind_from=ra_page_fault<br> &gt; wind_to=FIRST_CHILD (fault_frame-&gt;this)-&gt;fops-&gt;readv<br> &gt; unwind_to=ra_fault_cbk<br> &gt;<br> &gt; [global.callpool.stack.2.frame.2]<br> &gt; frame=0x7fd97000346c<br> &gt; ref_count=1<br> &gt; translator=patchy-readdir-ahead<br> &gt; complete=0<br> &gt;<br> &gt;<br> &gt; Note that the frame which was wound from ra_page_fault() to<br> &gt; write-behind is not yet complete and write-behind has not progressed<br> &gt; the call. There are several callstacks with a similar signature in<br> &gt; statedump.<br> <br></div></div>I think the culprit here is read-ahead, not write-behind. If read fop was dropped in write-behind, we should've seen a frame associated with write-behind (complete=0 for a frame associated with a xlator indicates frame was not unwound from _that_ xlator). But I didn't see any. Also empty request queues in wb_inode corroborate the hypothesis. K</blockquote><div><br></div><div>&nbsp;</div><div>We have seen both . See comment#17 in&nbsp;<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1379228" rel="noreferrer" target="_blank" data-mce-href="https://bugzilla.redhat.com/show_bug.cgi?id=1379228">https://bugzilla.redhat.com/show_bug.cgi?id=1379228</a>&nbsp;.</div><div><br></div><div><br></div><div>regards,</div><div>Nithya</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" data-mce-style="margin: 0px 0px 0px 0.8ex; border-left: 1px solid #cccccc; padding-left: 1ex;">arthick subrahmanya is working on a similar issue reported by a user. However, we've not made much of a progress till now.<br> <span class="gmail-im gmail-HOEnZb"><br> &gt;<br> &gt; In write-behind's readv implementation, we stub READ fops and enqueue<br> &gt; them in the relevant inode context. Once enqueued the stub resumes<br> &gt; when appropriate set of conditions happen in write-behind. This is not<br> &gt; happening now and&nbsp; I am not certain if:<br> &gt;<br> &gt; - READ fops are languishing in a queue and not being resumed or<br> &gt; - READ fops are pre-maturely dropped from a queue without winding or<br> &gt; unwinding<br> &gt;<br> &gt; When I gdb'd into the client process and examined the inode contexts<br> &gt; for write-behind, I found all queues to be empty. This seems to<br> &gt; indicate that the latter reason is more plausible but I have not yet<br> &gt; found a code path to account for this possibility.<br> &gt;<br> &gt; One approach to proceed further is to add more logs in write-behind to<br> &gt; get a better understanding of the problem. I will try that out<br> &gt; sometime later this week. We are also considering disabling<br> &gt; write-behind for smoke tests in the interim after a trial run (with<br> &gt; write-behind disabled) later in the day.<br> &gt;<br> &gt; Thanks,<br> &gt; Vijay<br> </span><div class="gmail-HOEnZb"><div class="gmail-h5">&gt; _______________________________________________<br> &gt; Gluster-devel mailing list<br> &gt; <a href="mailto:Gluster-devel@gluster.org" target="_blank" data-mce-href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br> &gt; <a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank" data-mce-href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br> &gt;<br></div></div></blockquote></div><br></div></div><br>_______________________________________________<br>Gluster-devel mailing list<br>Gluster-devel@gluster.org<br>http://www.gluster.org/mailman/listinfo/gluster-devel</blockquote><div><br></div></div></body></html>