<div dir="ltr">Hi Soumya,<div><br></div><div style="">Can you send a fix to this regression on upstream master too? This patch is merged there.</div><div style=""><br></div><div style="">regards,</div><div style="">Raghavendra</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar <span dir="ltr">&lt;<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Soumya,<br>
<br>
I analysed the issue and found out that crash has happened because of the patch [1].<br>
<br>
The patch doesn&#39;t set transport object to NULL in &#39;rpc_clnt_disable&#39; but instead does it on<br>
&#39;rpc_clnt_trigger_destroy&#39;. So if there are pending rpc invocations on the rpc object that<br>
is disabled (those instances are possible as happening now in changelog), it will trigger a<br>
CONNECT notify again with &#39;mydata&#39; that is freed causing a crash. This happens because<br>
&#39;rpc_clnt_submit&#39; reconnects if rpc is not connected.<br>
<br>
 rpc_clnt_submit (...) {<br>
   ...<br>
                if (conn-&gt;connected == 0) {<br>
                        ret = rpc_transport_connect (conn-&gt;trans,<br>
                                                     conn-&gt;config.remote_port);<br>
                }<br>
   ...<br>
 }<br>
<br>
Without your patch, conn-&gt;trans was set NULL and hence CONNECT fails not resulting with<br>
CONNECT notify call. And also the cleanup happens in failure path.<br>
<br>
So the memory leak can happen, if there is no try for rpc invocation after DISCONNECT.<br>
It will be cleaned up otherwise.<br>
<br>
<br>
[1] <a href="http://review.gluster.org/#/c/13507/" rel="noreferrer" target="_blank">http://review.gluster.org/#/c/13507/</a><br>
<span class="im HOEnZb"><br>
Thanks and Regards,<br>
Kotresh H R<br>
<br>
----- Original Message -----<br>
</span><div class="HOEnZb"><div class="h5">&gt; From: &quot;Kotresh Hiremath Ravishankar&quot; &lt;<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a>&gt;<br>
&gt; To: &quot;Soumya Koduri&quot; &lt;<a href="mailto:skoduri@redhat.com">skoduri@redhat.com</a>&gt;<br>
&gt; Cc: <a href="mailto:avishwan@redhat.com">avishwan@redhat.com</a>, &quot;Gluster Devel&quot; &lt;<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>&gt;<br>
&gt; Sent: Monday, February 29, 2016 4:15:22 PM<br>
&gt; Subject: Re: Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t<br>
&gt;<br>
&gt; Hi Soumya,<br>
&gt;<br>
&gt; I just tested that it is reproducible only with your patch both in master and<br>
&gt; 3.76 branch.<br>
&gt; The geo-rep test cases are marked bad in master. So it&#39;s not hit in master.<br>
&gt; rpc is introduced<br>
&gt; in changelog xlator to communicate to applications via libgfchangelog.<br>
&gt; Venky/Me will check<br>
&gt; why is the crash happening and will update.<br>
&gt;<br>
&gt;<br>
&gt; Thanks and Regards,<br>
&gt; Kotresh H R<br>
&gt;<br>
&gt; ----- Original Message -----<br>
&gt; &gt; From: &quot;Soumya Koduri&quot; &lt;<a href="mailto:skoduri@redhat.com">skoduri@redhat.com</a>&gt;<br>
&gt; &gt; To: <a href="mailto:avishwan@redhat.com">avishwan@redhat.com</a>, &quot;kotresh&quot; &lt;<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a>&gt;<br>
&gt; &gt; Cc: &quot;Gluster Devel&quot; &lt;<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>&gt;<br>
&gt; &gt; Sent: Monday, February 29, 2016 2:10:51 PM<br>
&gt; &gt; Subject: Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t<br>
&gt; &gt;<br>
&gt; &gt; Hi Aravinda/Kotresh,<br>
&gt; &gt;<br>
&gt; &gt; With [1], I consistently see cores generated with the test<br>
&gt; &gt; &#39;./tests/geo-rep/georep-basic-dr-tarssh.t&#39; in release-3.7 branch. From<br>
&gt; &gt; the cores, looks like we are trying to dereference a freed<br>
&gt; &gt; changelog_rpc_clnt_t(crpc) object in changelog_rpc_notify(). Strangely<br>
&gt; &gt; this was not reported in master branch.<br>
&gt; &gt;<br>
&gt; &gt; I tried debugging but couldn&#39;t find any possible suspects. I request you<br>
&gt; &gt; to take a look and let me know if [1] caused any regression.<br>
&gt; &gt;<br>
&gt; &gt; Thanks,<br>
&gt; &gt; Soumya<br>
&gt; &gt;<br>
&gt; &gt; [1] <a href="http://review.gluster.org/#/c/13507/" rel="noreferrer" target="_blank">http://review.gluster.org/#/c/13507/</a><br>
&gt; &gt;<br>
&gt;<br>
_______________________________________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Raghavendra G<br></div>
</div>