<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 3, 2016 at 6:26 PM, Kotresh Hiremath Ravishankar <span dir="ltr"><<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi,<br>
<br>
Yes, with this patch we need not set conn->trans to NULL in rpc_clnt_disable<br></blockquote><div><br></div><div style="">While [1] fixes the crash, things can be improved in the way how changelog is using rpc. </div><div style=""><br></div><div style="">1. In the current code, there is an rpc_clnt object leak during disconnect event.</div><div style="">2. Also, freed "mydata" of changelog is still associated with rpc_clnt object (corollary of 1), though change log might not get any events with "mydata" (as connection is dead).</div><div style=""><br></div><div style="">I've discussed with Kotresh about changes needed, offline. So, following are the action items.</div><div style="">1. Soumya's patch [2] is valid and is needed for 3.7 branch too.</div><div style="">2. [2] can be accepted. However, someone might want to re-use an rpc object after disabling it, like introducing a new api rpc_clnt_enable_again (though no of such use-cases is very less). But [2] doesn't allow it. The point is as long as rpc-clnt object is alive, transport object is alive (though disconnected) and we can re-use it. So, I would prefer not to accept it. </div><div style="">3. Kotresh will work on new changes to make sure changelog makes correct use of rpc-clnt.</div><div style=""><br></div><div style="">[1] <a href="http://review.gluster.org/#/c/13592">http://review.gluster.org/#/c/13592</a></div><div style="">[2] <a href="http://review.gluster.org/#/c/1359">http://review.gluster.org/#/c/1359</a></div><div style=""><br></div><div style="">regards,</div><div style="">Raghavendra.</div><div style=""><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span class="im"><br>
Thanks and Regards,<br>
Kotresh H R<br>
<br>
----- Original Message -----<br>
> From: "Soumya Koduri" <<a href="mailto:skoduri@redhat.com">skoduri@redhat.com</a>><br>
</span><span class="im">> To: "Kotresh Hiremath Ravishankar" <<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a>>, "Raghavendra G" <<a href="mailto:raghavendra@gluster.com">raghavendra@gluster.com</a>><br>
> Cc: "Gluster Devel" <<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>><br>
</span><div class=""><div class="h5">> Sent: Thursday, March 3, 2016 5:06:00 PM<br>
> Subject: Re: [Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t<br>
><br>
><br>
><br>
> On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:<br>
> > [Replying on top of my own reply]<br>
> ><br>
> > Hi,<br>
> ><br>
> > I have submitted the below patch [1] to avoid the issue of<br>
> > 'rpc_clnt_submit'<br>
> > getting reconnected. But it won't take care of memory leak problem you were<br>
> > trying to fix. That we have to carefully go through all cases and fix it.<br>
> > Please have a look at it.<br>
> ><br>
> Looks good. IIUC, with this patch, we need not set conn->trans to NULL<br>
> in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak as<br>
> the transport object shall then get freed as part of<br>
> 'rpc_clnt_trigger_destroy'.<br>
><br>
><br>
> > <a href="http://review.gluster.org/#/c/13592/" rel="noreferrer" target="_blank">http://review.gluster.org/#/c/13592/</a><br>
> ><br>
> > Thanks and Regards,<br>
> > Kotresh H R<br>
> ><br>
> > ----- Original Message -----<br>
> >> From: "Kotresh Hiremath Ravishankar" <<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a>><br>
> >> To: "Soumya Koduri" <<a href="mailto:skoduri@redhat.com">skoduri@redhat.com</a>><br>
> >> Cc: "Raghavendra G" <<a href="mailto:raghavendra@gluster.com">raghavendra@gluster.com</a>>, "Gluster Devel"<br>
> >> <<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>><br>
> >> Sent: Thursday, March 3, 2016 3:39:11 PM<br>
> >> Subject: Re: [Gluster-devel] Cores generated with<br>
> >> ./tests/geo-rep/georep-basic-dr-tarssh.t<br>
> >><br>
> >> Hi Soumya,<br>
> >><br>
> >> I tested the lastes patch [2] on master where your previous patch [1] in<br>
> >> merged.<br>
> >> I see crashes at different places.<br>
> >><br>
> >> 1. If there are code paths that are holding rpc object without taking ref<br>
> >> on<br>
> >> it, all those<br>
> >> code path will crash on invoking rpc submit on that object as rpc<br>
> >> object<br>
> >> would have freed<br>
> >> by last unref on DISCONNECT event. I see this kind of use-case in<br>
> >> chagnelog rpc code.<br>
> >> Need to check on other users of rpc.<br>
> Agree. We should fix all such code-paths. Since this seem to be an<br>
> intricate fix, shall we take these patches only in master branch and not<br>
> in 3.7 release for now till we fix all such paths as we encounter?<br>
><br>
> >><br>
> >> 2. And also we need to take care of reconnect timers that are being set<br>
> >> and<br>
> >> are re-tried to<br>
> >> connect back on expiration. In those cases also, we might crash as rpc<br>
> >> object would have freed.<br>
> Your patch addresses this..right?<br>
><br>
> Thanks,<br>
> Soumya<br>
><br>
> >><br>
> >><br>
> >> [1] <a href="http://review.gluster.org/#/c/13507/" rel="noreferrer" target="_blank">http://review.gluster.org/#/c/13507/</a><br>
> >> [2] <a href="http://review.gluster.org/#/c/13587/" rel="noreferrer" target="_blank">http://review.gluster.org/#/c/13587/</a><br>
> >><br>
> >> Thanks and Regards,<br>
> >> Kotresh H R<br>
> >><br>
> >> ----- Original Message -----<br>
> >>> From: "Soumya Koduri" <<a href="mailto:skoduri@redhat.com">skoduri@redhat.com</a>><br>
> >>> To: "Raghavendra G" <<a href="mailto:raghavendra@gluster.com">raghavendra@gluster.com</a>>, "Kotresh Hiremath<br>
> >>> Ravishankar" <<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a>><br>
> >>> Cc: "Gluster Devel" <<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>><br>
> >>> Sent: Thursday, March 3, 2016 12:24:00 PM<br>
> >>> Subject: Re: [Gluster-devel] Cores generated with<br>
> >>> ./tests/geo-rep/georep-basic-dr-tarssh.t<br>
> >>><br>
> >>> Thanks a lot Kotresh.<br>
> >>><br>
> >>> On 03/03/2016 08:47 AM, Raghavendra G wrote:<br>
> >>>> Hi Soumya,<br>
> >>>><br>
> >>>> Can you send a fix to this regression on upstream master too? This patch<br>
> >>>> is merged there.<br>
> >>>><br>
> >>> I have submitted below patch.<br>
> >>> <a href="http://review.gluster.org/#/c/13587/" rel="noreferrer" target="_blank">http://review.gluster.org/#/c/13587/</a><br>
> >>><br>
> >>> Kindly review the same.<br>
> >>><br>
> >>> Thanks,<br>
> >>> Soumya<br>
> >>><br>
> >>>> regards,<br>
> >>>> Raghavendra<br>
> >>>><br>
> >>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar<br>
> >>>> <<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a> <mailto:<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a>>> wrote:<br>
> >>>><br>
> >>>> Hi Soumya,<br>
> >>>><br>
> >>>> I analysed the issue and found out that crash has happened because<br>
> >>>> of the patch [1].<br>
> >>>><br>
> >>>> The patch doesn't set transport object to NULL in<br>
> >>>> 'rpc_clnt_disable'<br>
> >>>> but instead does it on<br>
> >>>> 'rpc_clnt_trigger_destroy'. So if there are pending rpc invocations<br>
> >>>> on the rpc object that<br>
> >>>> is disabled (those instances are possible as happening now in<br>
> >>>> changelog), it will trigger a<br>
> >>>> CONNECT notify again with 'mydata' that is freed causing a crash.<br>
> >>>> This happens because<br>
> >>>> 'rpc_clnt_submit' reconnects if rpc is not connected.<br>
> >>>><br>
> >>>> rpc_clnt_submit (...) {<br>
> >>>> ...<br>
> >>>> if (conn->connected == 0) {<br>
> >>>> ret = rpc_transport_connect (conn->trans,<br>
> >>>><br>
> >>>> conn->config.remote_port);<br>
> >>>> }<br>
> >>>> ...<br>
> >>>> }<br>
> >>>><br>
> >>>> Without your patch, conn->trans was set NULL and hence CONNECT<br>
> >>>> fails<br>
> >>>> not resulting with<br>
> >>>> CONNECT notify call. And also the cleanup happens in failure path.<br>
> >>>><br>
> >>>> So the memory leak can happen, if there is no try for rpc<br>
> >>>> invocation<br>
> >>>> after DISCONNECT.<br>
> >>>> It will be cleaned up otherwise.<br>
> >>>><br>
> >>>><br>
> >>>> [1] <a href="http://review.gluster.org/#/c/13507/" rel="noreferrer" target="_blank">http://review.gluster.org/#/c/13507/</a><br>
> >>>><br>
> >>>> Thanks and Regards,<br>
> >>>> Kotresh H R<br>
> >>>><br>
> >>>> ----- Original Message -----<br>
> >>>> > From: "Kotresh Hiremath Ravishankar" <<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a><br>
> >>>> <mailto:<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a>>><br>
> >>>> > To: "Soumya Koduri" <<a href="mailto:skoduri@redhat.com">skoduri@redhat.com</a><br>
> >>>> > <mailto:<a href="mailto:skoduri@redhat.com">skoduri@redhat.com</a>>><br>
> >>>> > Cc: <a href="mailto:avishwan@redhat.com">avishwan@redhat.com</a> <mailto:<a href="mailto:avishwan@redhat.com">avishwan@redhat.com</a>>, "Gluster<br>
> >>>> Devel" <<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a><br>
> >>>> <mailto:<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>>><br>
> >>>> > Sent: Monday, February 29, 2016 4:15:22 PM<br>
> >>>> > Subject: Re: Cores generated with<br>
> >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t<br>
> >>>> ><br>
> >>>> > Hi Soumya,<br>
> >>>> ><br>
> >>>> > I just tested that it is reproducible only with your patch both<br>
> >>>> in master and<br>
> >>>> > 3.76 branch.<br>
> >>>> > The geo-rep test cases are marked bad in master. So it's not hit<br>
> >>>> in master.<br>
> >>>> > rpc is introduced<br>
> >>>> > in changelog xlator to communicate to applications via<br>
> >>>> libgfchangelog.<br>
> >>>> > Venky/Me will check<br>
> >>>> > why is the crash happening and will update.<br>
> >>>> ><br>
> >>>> ><br>
> >>>> > Thanks and Regards,<br>
> >>>> > Kotresh H R<br>
> >>>> ><br>
> >>>> > ----- Original Message -----<br>
> >>>> > > From: "Soumya Koduri" <<a href="mailto:skoduri@redhat.com">skoduri@redhat.com</a><br>
> >>>> <mailto:<a href="mailto:skoduri@redhat.com">skoduri@redhat.com</a>>><br>
> >>>> > > To: <a href="mailto:avishwan@redhat.com">avishwan@redhat.com</a> <mailto:<a href="mailto:avishwan@redhat.com">avishwan@redhat.com</a>>,<br>
> >>>> > > "kotresh"<br>
> >>>> <<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a> <mailto:<a href="mailto:khiremat@redhat.com">khiremat@redhat.com</a>>><br>
> >>>> > > Cc: "Gluster Devel" <<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a><br>
> >>>> <mailto:<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>>><br>
> >>>> > > Sent: Monday, February 29, 2016 2:10:51 PM<br>
> >>>> > > Subject: Cores generated with<br>
> >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t<br>
> >>>> > ><br>
> >>>> > > Hi Aravinda/Kotresh,<br>
> >>>> > ><br>
> >>>> > > With [1], I consistently see cores generated with the test<br>
> >>>> > > './tests/geo-rep/georep-basic-dr-tarssh.t' in release-3.7<br>
> >>>> branch. From<br>
> >>>> > > the cores, looks like we are trying to dereference a freed<br>
> >>>> > > changelog_rpc_clnt_t(crpc) object in changelog_rpc_notify().<br>
> >>>> Strangely<br>
> >>>> > > this was not reported in master branch.<br>
> >>>> > ><br>
> >>>> > > I tried debugging but couldn't find any possible suspects. I<br>
> >>>> request you<br>
> >>>> > > to take a look and let me know if [1] caused any regression.<br>
> >>>> > ><br>
> >>>> > > Thanks,<br>
> >>>> > > Soumya<br>
> >>>> > ><br>
> >>>> > > [1] <a href="http://review.gluster.org/#/c/13507/" rel="noreferrer" target="_blank">http://review.gluster.org/#/c/13507/</a><br>
> >>>> > ><br>
> >>>> ><br>
> >>>> _______________________________________________<br>
> >>>> Gluster-devel mailing list<br>
> >>>> <a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a> <mailto:<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>><br>
> >>>> <a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
> >>>><br>
> >>>><br>
> >>>><br>
> >>>><br>
> >>>> --<br>
> >>>> Raghavendra G<br>
> >>><br>
> >> _______________________________________________<br>
> >> Gluster-devel mailing list<br>
> >> <a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>
> >> <a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
> >><br>
><br>
_______________________________________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Raghavendra G<br></div>
</div></div>