<p dir="ltr">-Atin<br>

Sent from one plus one<br>

On Oct 27, 2015 6:40 PM, &quot;Sander Zijlstra&quot; &lt;<a href="mailto:sander.zijlstra@surfsara.nl">sander.zijlstra@surfsara.nl</a>&gt; wrote:<br>

&gt;<br>

&gt; Hi Atin,<br>

&gt;<br>

&gt; You’re right in saying if it’s activate then all nodes should have it activated.<br>

&gt;<br>

&gt; What I find strange is that when glusterfsd has problems communicating with the other peers that that single node with issues isn’t considered “not connected” and thus expelled from the cluster somehow; in my case it caused a complete hang of the trusted storage pool.<br>

IMO this hang is till the RPC times out which is 10 minutes. GlusterD assumes that its peers are connected by looking at peerinfo&#39;s connected flag which is still true in this case as the RPC disconnect doesn&#39;t work in this case and issues a RPC call and wait for the callback to hit which times out after 10 mins.<br>

&gt;<br>

&gt; And to emphasise this, pinging was no problem as it uses small packets anyway so jumbo frames were not used at all… enabling jumbo frames on the interface and switches is only a way to tell the TCP/IP stack that it can send larger packets but it does’t have to.<br>

&gt;<br>

&gt; Or am I mistaking in that the TCP/IP stack will control wether to send the bigger packets and that glusterfsd has no control over that?<br>

&gt;<br>

&gt; Met vriendelijke groet / kind regards,<br>

&gt;<br>

&gt; Sander Zijlstra<br>

&gt;<br>

&gt; | Linux Engineer | SURFsara | Science Park 140 | 1098XG Amsterdam | T +31 (0)6 43 99 12 47 |<a href="mailto:sander.zijlstra@surfsara.nl"> sander.zijlstra@surfsara.nl</a> |<a href="http://www.surfsara.nl"> www.surfsara.nl</a> |<br>

&gt;<br>

&gt; Regular day off on friday<br>

&gt;<br>

&gt; &gt; On 15 Oct 2015, at 08:24, Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>&gt; wrote:<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; On 10/14/2015 05:09 PM, Sander Zijlstra wrote:<br>

&gt; &gt;&gt; LS,<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; I recently reconfigured one of my gluster nodes and forgot to update the MTU size on the switch while I did configure the host with jumbo frames.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; The result was that the complete cluster had communication issues.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; All systems are part of a distributed striped volume with a replica size of 2 but still the cluster was completely unusable until I updated the switch port to accept jumbo frames rather than to discard them.<br>

&gt; &gt; This is expected. When enabling the network components to communicate<br>

&gt; &gt; with TCP jumbo frames in a Gluster Trusted Storage Pool, you&#39;d need to<br>

&gt; &gt; ensure that all the network components such as switches, nodes are<br>

&gt; &gt; configured properly. I think with this setting you&#39;d fail to ping the<br>

&gt; &gt; other nodes in the pool. So that could be a step of verification before<br>

&gt; &gt; you set the cluster up.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; The symptoms were:<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; - Gluster clients had a very hard time reading the volume information and thus couldn’t do any filesystem ops on them.<br>

&gt; &gt;&gt; - The glusterfs servers could see each other (peer status) and a volume info command was ok, but a volume status command would not return or would return a “staging failed” error.<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; I know MTU size mixing and don’t fragment bit’s can screw up a lot but why wasn’t that gluster peer just discarded from the cluster so that not all clients kept on communicating with it and causing all sorts of errors.<br>

&gt; &gt; To answer this question, peer status &amp; volume info are local operation<br>

&gt; &gt; and doesn&#39;t incur N/W, so in this very same case you might see peer<br>

&gt; &gt; status showing all the nodes are connected all though there is a<br>

&gt; &gt; breakage, OTOH in status command originator node communicates with other<br>

&gt; &gt; peers and hence it fails there.<br>

&gt; &gt;<br>

&gt; &gt; HTH,<br>

&gt; &gt; Atin<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; I use glusterFS 3.6.2 at the moment…..<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; Kind regards<br>

&gt; &gt;&gt; Sander<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; _______________________________________________<br>

&gt; &gt;&gt; Gluster-users mailing list<br>

&gt; &gt;&gt;<a href="mailto:Gluster-users@gluster.org"> Gluster-users@gluster.org</a><br>

&gt; &gt;&gt;<a href="http://www.gluster.org/mailman/listinfo/gluster-users"> http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

&gt; &gt;&gt;<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; Gluster-users mailing list<br>

&gt;<a href="mailto:Gluster-users@gluster.org"> Gluster-users@gluster.org</a><br>

&gt;<a href="http://www.gluster.org/mailman/listinfo/gluster-users"> http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

</p>