<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 26, 2016 at 12:34 AM, Niels de Vos <span dir="ltr">&lt;<a href="mailto:ndevos@redhat.com" target="_blank">ndevos@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>On Mon, Jul 25, 2016 at 04:34:17PM +0530, Avra Sengupta wrote:<br>
&gt; The crux of the problem is that as of today, brick processes on restart try<br>
&gt; to reuse the old port they were using (assuming that no other process will<br>
&gt; be using it, and not consulting pmap_registry_alloc() before using it). With<br>
&gt; a recent change, pmap_registry_alloc (), reassigns older ports that were<br>
&gt; used, but are now free. Hence snapd now gets a port that was previously used<br>
&gt; by a brick and tries to bind to it, whereas the older brick process without<br>
&gt; consulting pmap table blindly tries to connect to it, and hence we see this<br>
&gt; problem.<br>
&gt;<br>
&gt; Now coming to the fix, I feel brick process should not try to get the older<br>
&gt; port and should just take a new port every time it comes up. We will not run<br>
&gt; out of ports with this change coz, now pmap allocates old ports again, and<br>
&gt; the previous port being used by the brick process will eventually be reused.<br>
&gt; If anyone sees any concern with this approach, please feel free to raise so<br>
&gt; now.<br>
<br>
</span>I wonder how this is handled with reconnecting clients. If a client<br>
thinks it was connected to a brick, but the connection was lost, does it<br>
try to connect to the same port again? I dont know if it really connects<br>
to the pmap service in GlusterD to find the new/updated port...<br></blockquote><div><br></div><div> client does query for the portmap every time using client_query_portmap () in reconnect logic. So in case there is a change in the port it goes for rpc_clnt_reconfig which will ensure client talk to the brick process(es) on the new port.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Niels<br>
<div><div><br>
<br>
&gt;<br>
&gt; While awaiting feedback from you guys, I have sent this patch<br>
&gt; (<a href="http://review.gluster.org/15001" rel="noreferrer" target="_blank">http://review.gluster.org/15001</a>), which moves the said test case to bad<br>
&gt; tests for now, and after we collectively reach to a conclusion on the fix,<br>
&gt; we will remove this from bad test.<br>
&gt;<br>
&gt; Regards,<br>
&gt; Avra<br>
&gt;<br>
&gt; On 07/25/2016 02:33 PM, Avra Sengupta wrote:<br>
&gt; &gt; The failure suggests that the port snapd is trying to bind to is already<br>
&gt; &gt; in use. But snapd has been modified to use a new port everytime. I am<br>
&gt; &gt; looking into this.<br>
&gt; &gt;<br>
&gt; &gt; On 07/25/2016 02:23 PM, Nithya Balachandran wrote:<br>
&gt; &gt; &gt; More failures:<br>
&gt; &gt; &gt; <a href="https://build.gluster.org/job/rackspace-regression-2GB-triggered/22452/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/rackspace-regression-2GB-triggered/22452/console</a><br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; I see these messages in the snapd.log:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; [2016-07-22 05:31:52.482282] I<br>
&gt; &gt; &gt; [rpcsvc.c:2199:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service:<br>
&gt; &gt; &gt; Configured rpc.outstanding-rpc-limit with value 64<br>
&gt; &gt; &gt; [2016-07-22 05:31:52.482352] W [MSGID: 101002]<br>
&gt; &gt; &gt; [options.c:954:xl_opt_validate] 0-patchy-server: option<br>
&gt; &gt; &gt; &#39;listen-port&#39; is deprecated, preferred is<br>
&gt; &gt; &gt; &#39;transport.socket.listen-port&#39;, continuing with correction<br>
&gt; &gt; &gt; [2016-07-22 05:31:52.482436] E [socket.c:771:__socket_server_bind]<br>
&gt; &gt; &gt; 0-tcp.patchy-server: binding to  failed: Address already in use<br>
&gt; &gt; &gt; [2016-07-22 05:31:52.482447] E [socket.c:774:__socket_server_bind]<br>
&gt; &gt; &gt; 0-tcp.patchy-server: Port is already in use<br>
&gt; &gt; &gt; [2016-07-22 05:31:52.482459] W<br>
&gt; &gt; &gt; [rpcsvc.c:1630:rpcsvc_create_listener] 0-rpc-service: listening on<br>
&gt; &gt; &gt; transport failed<br>
&gt; &gt; &gt; [2016-07-22 05:31:52.482469] W [MSGID: 115045] [server.c:1061:init]<br>
&gt; &gt; &gt; 0-patchy-server: creation of listener failed<br>
&gt; &gt; &gt; [2016-07-22 05:31:52.482481] E [MSGID: 101019]<br>
&gt; &gt; &gt; [xlator.c:433:xlator_init] 0-patchy-server: Initialization of volume<br>
&gt; &gt; &gt; &#39;patchy-server&#39; failed, review your volfile again<br>
&gt; &gt; &gt; [2016-07-22 05:31:52.482491] E [MSGID: 101066]<br>
&gt; &gt; &gt; [graph.c:324:glusterfs_graph_init] 0-patchy-server: initializing<br>
&gt; &gt; &gt; translator failed<br>
&gt; &gt; &gt; [2016-07-22 05:31:52.482499] E [MSGID: 101176]<br>
&gt; &gt; &gt; [graph.c:670:glusterfs_graph_activate] 0-graph: init failed<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; On Mon, Jul 25, 2016 at 12:00 PM, Ashish Pandey &lt;<a href="mailto:aspandey@redhat.com" target="_blank">aspandey@redhat.com</a><br>
</div></div><span>&gt; &gt; &gt; &lt;mailto:<a href="mailto:aspandey@redhat.com" target="_blank">aspandey@redhat.com</a>&gt;&gt; wrote:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;     Hi,<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;     Following test has failed 3 times in last two days -<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;     ./tests/bugs/snapshot/bug-1316437.t<br>
&gt; &gt; &gt;     <a href="https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull" rel="noreferrer" target="_blank">https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull</a><br>
&gt; &gt; &gt;     <a href="https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull" rel="noreferrer" target="_blank">https://build.gluster.org/job/rackspace-regression-2GB-triggered/22445/consoleFull</a><br>
&gt; &gt; &gt;     <a href="https://build.gluster.org/job/rackspace-regression-2GB-triggered/22470/consoleFull" rel="noreferrer" target="_blank">https://build.gluster.org/job/rackspace-regression-2GB-triggered/22470/consoleFull</a><br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;     Please take a look at it and check if it spurious failure or not.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;     Ashish<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;     _______________________________________________<br>
&gt; &gt; &gt;     Gluster-devel mailing list<br>
</span>&gt; &gt; &gt;     <a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a> &lt;mailto:<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a>&gt;<br>
&gt; &gt; &gt;     <a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt;<br>
&gt;<br>
</blockquote></div><br><br clear="all"><br>-- <br><div data-smartmail="gmail_signature"><div dir="ltr"><div><br></div>--Atin<br></div></div>
</div></div>