<div dir="ltr">Since I stopped writing to the clients (so I could cleanly work on the split brain) I got no more entries on /var/log/gluster.log (this is the client log, right?)<div><br></div><div><br></div><div>While working with diff command in order to fix the split brain, I saw several entries like these:<div><br></div><div><div>diff: r2/webhost/sites/clipart/assets/apache/images/13/templates/558482: Transport endpoint is not connected</div><div>diff: r2/webhost/sites/clipart/assets/apache/images/13/templates/558483: Transport endpoint is not connected</div><div>diff: r2/webhost/sites/clipart/assets/apache/images/13/templates/558484: Transport endpoint is not connected</div></div><div><br></div><div>They happen a lot, then stops. Then happen again and so on.</div><div><br></div><div>At the same time the errors are showing, ping from the system I&#39;m working on split-brain to the system that is failing to connect (r2) shows this:</div><div><br></div><div><div>64 bytes from r2-server (r2-ip): icmp_seq=662 ttl=64 time=1.21 ms</div><div>64 bytes from r2-server (r2-ip): icmp_seq=663 ttl=64 time=0.990 ms</div><div>64 bytes from r2-server (r2-ip): icmp_seq=664 ttl=64 time=1.01 ms</div></div><div><br></div><div>I know this is a very trivial network checking that may not be showing me what I want to see, and I&#39;m working on more elaborated one. But I&#39;m completely open for suggestions on how to properly do that in order to verify if this is issue when talking about gluster.</div><div><br></div><div><br></div><div>So far, thank you so much, guys!</div><div><br></div><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 26, 2015 at 8:36 PM, Joe Julian <span dir="ltr">&lt;<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    Check your client logs. Perhaps the client isn&#39;t actually connecting
    to both servers. <br><div><div class="h5">
    <br>
    <div>On 01/26/2015 02:12 PM, Tiago Santos
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">That&#39;s what I meant. Sorry for the confusion.<br>
        <br>
        I&#39;m writing on Client1 (same server as Brick1). Client2 (mounted
        Brick2, on server2) has nothing writing to it (so far).
        <div><br>
        </div>
        <div>My wondering is how I went up on having a split-brain if
          I&#39;m only writing on one client.<br>
          <div><br>
          </div>
          <div><br>
          </div>
          <div><br>
            <br>
            <div class="gmail_extra"><br>
              <div class="gmail_quote">On Mon, Jan 26, 2015 at 8:04 PM,
                Joe Julian <span dir="ltr">&lt;<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>&gt;</span>
                wrote:<br>
                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div bgcolor="#FFFFFF" text="#000000"> Nothing but
                    GlusterFS should be writing to bricks. Mount a
                    client and write there.
                    <div>
                      <div><br>
                        <br>
                        <div>On 01/26/2015 01:38 PM, Tiago Santos wrote:<br>
                        </div>
                        <blockquote type="cite">
                          <div dir="ltr">Right.
                            <div><br>
                            </div>
                            <div>I have Brick1 being constantly written.
                              But I have nothing writing on Brick2. It
                              just get &quot;healed&quot; data from Brick1.</div>
                            <div><br>
                            </div>
                            <div>This setup is still not in production,
                              and there&#39;s no applications using that
                              data. I have rsyncs constantly updating
                              Brick1 (bring data from production
                              servers), and then Gluster updates Brick2.</div>
                            <div><br>
                            </div>
                            <div>Which makes me wonder how may I be
                              creating multiple replicas during a
                              split-brain.</div>
                            <div><br>
                            </div>
                            <div><br>
                            </div>
                            <div>It may be the case that, having a
                              split-brain event, I may be updating
                              versions of the same file on Brick1
                              (only), and Gluster understands it as
                              different versions and things get confuse?</div>
                            <div><br>
                            </div>
                            <div><br>
                            </div>
                            <div>Anyways, while we talk I&#39;m gonna run
                              Joe&#39;s precious procedure on split-brain
                              recovery.</div>
                            <div><br>
                            </div>
                            <div><br>
                            </div>
                            <div><br>
                            </div>
                            <div><br>
                            </div>
                          </div>
                          <div class="gmail_extra"><br>
                            <div class="gmail_quote">On Mon, Jan 26,
                              2015 at 7:23 PM, Joe Julian <span dir="ltr">&lt;<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>&gt;</span>
                              wrote:<br>
                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Mismatched
                                GFIDs would happen if a file is created
                                on multiple replicas during a
                                split-brain event. The GFID is assigned
                                at file creation.
                                <div>
                                  <div><br>
                                    <br>
                                    On 01/26/2015 01:04 PM, A Ghoshal
                                    wrote:<br>
                                  </div>
                                </div>
                                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                  <div>
                                    <div> Yep, so it is indeed a
                                      split-brain caused by a mismatch
                                      of the trusted.gfid attribute.<br>
                                      <br>
                                      Sadly, I don&#39;t know precisely what
                                      causes it. -Communication loss
                                      might be one of the triggers. I am
                                      guessing the files with the
                                      problem are dynamic, correct? In
                                      our setup (also replica 2),
                                      communication is never a problem
                                      but we do see this when one of the
                                      server takes a reboot. Maybe some
                                      obscure and difficult to
                                      understand race between background
                                      self-heal and the self heal
                                      daemon...<br>
                                      <br>
                                      In any case, a normal procedure
                                      for split brain recovery would
                                      work for you if you wish to get
                                      you files back in function. It&#39;s
                                      easy to find on google. I use the
                                      instructions on Joe Julian&#39;s blog
                                      page myself.<br>
                                      <br>
                                      <br>
                                        -----Tiago Santos &lt;<a href="mailto:tiago@musthavemenus.com" target="_blank">tiago@musthavemenus.com</a>&gt;

                                      wrote: -----<br>
                                      <br>
                                        =======================<br>
                                        To: A Ghoshal &lt;<a href="mailto:a.ghoshal@tcs.com" target="_blank">a.ghoshal@tcs.com</a>&gt;<br>
                                        From: Tiago Santos &lt;<a href="mailto:tiago@musthavemenus.com" target="_blank">tiago@musthavemenus.com</a>&gt;<br>
                                        Date: 01/27/2015 02:11AM<br>
                                        Cc: gluster-users &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;<br>
                                        Subject: Re: [Gluster-users]
                                      Pretty much any operation related
                                      to Gluster mounted fs hangs for a
                                      while<br>
                                        =======================<br>
                                          Oh, right!<br>
                                      <br>
                                      Follow the outputs:<br>
                                      <br>
                                      <br>
                                      root@web3:/export/images1-1/brick#
                                      time getfattr -m . -d -e hex<br>
templates/assets/prod/temporary/13/user_1339200.png<br>
                                      # file:
                                      templates/assets/prod/temporary/13/user_1339200.png<br>
trusted.afr.site-images-client-0=0x000000000000000400000000<br>
trusted.afr.site-images-client-1=0x000000020000000900000000<br>
trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527<br>
                                      <br>
                                      real 0m0.024s<br>
                                      user 0m0.001s<br>
                                      sys 0m0.001s<br>
                                      <br>
                                      <br>
                                      <br>
                                      root@web4:/export/images2-1/brick#
                                      time getfattr -m . -d -e hex<br>
templates/assets/prod/temporary/13/user_1339200.png<br>
                                      # file:
                                      templates/assets/prod/temporary/13/user_1339200.png<br>
trusted.afr.site-images-client-0=0x000000000000000000000000<br>
trusted.afr.site-images-client-1=0x000000000000000000000000<br>
trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3<br>
                                      <br>
                                      real 0m0.003s<br>
                                      user 0m0.000s<br>
                                      sys 0m0.006s<br>
                                      <br>
                                      <br>
                                      Not sure exactly what that means.
                                      I&#39;m googling, and would appreciate
                                      if you<br>
                                      guys can bring some light.<br>
                                      <br>
                                      Thanks!<br>
                                      --<br>
                                      Tiago<br>
                                      <br>
                                      <br>
                                      <br>
                                      <br>
                                      On Mon, Jan 26, 2015 at 6:16 PM, A
                                      Ghoshal &lt;<a href="mailto:a.ghoshal@tcs.com" target="_blank">a.ghoshal@tcs.com</a>&gt;

                                      wrote:<br>
                                      <br>
                                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                        Actually you ran getfattr on the
                                        volume - which is why the
                                        requisite<br>
                                        extended attributes never showed
                                        up...<br>
                                        <br>
                                        Your bricks are mounted
                                        elsewhere.<br>
                                          /exports/images1-1/brick, and
                                        exports/images2-1/brick<br>
                                        <br>
                                        Btw, what version of Linux do
                                        you use? And, are the files you
                                        observe the<br>
                                        input/output errors on
                                        soft-links?<br>
                                        <br>
                                          -----Tiago Santos &lt;<a href="mailto:tiago@musthavemenus.com" target="_blank">tiago@musthavemenus.com</a>&gt;

                                        wrote: -----<br>
                                        <br>
                                          =======================<br>
                                          To: A Ghoshal &lt;<a href="mailto:a.ghoshal@tcs.com" target="_blank">a.ghoshal@tcs.com</a>&gt;<br>
                                          From: Tiago Santos &lt;<a href="mailto:tiago@musthavemenus.com" target="_blank">tiago@musthavemenus.com</a>&gt;<br>
                                          Date: 01/27/2015 12:20AM<br>
                                          Cc: gluster-users &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;<br>
                                          Subject: Re: [Gluster-users]
                                        Pretty much any operation
                                        related to Gluster<br>
                                        mounted fs hangs for a while<br>
                                          =======================<br>
                                            Thanks for you input,
                                        Anirban.<br>
                                        <br>
                                        I ran the commands on both
                                        servers, with the following
                                        results:<br>
                                        <br>
                                        <br>
                                        root@web3:/var/www/site-images#
                                        time getfattr -m . -d -e hex<br>
templates/assets/prod/temporary/13/user_1339200.png<br>
                                        <br>
                                        real 0m34.524s<br>
                                        user 0m0.004s<br>
                                        sys 0m0.000s<br>
                                        <br>
                                        <br>
                                        root@web4:/var/www/site-images#
                                        time getfattr -m . -d -e hex<br>
templates/assets/prod/temporary/13/user_1339200.png<br>
                                        getfattr:
                                        templates/assets/prod/temporary/13/user_1339200.png:
                                        Input/output<br>
                                        error<br>
                                        <br>
                                        real 0m11.315s<br>
                                        user 0m0.001s<br>
                                        sys 0m0.003s<br>
                                        root@web4:/var/www/site-images#
                                        ls<br>
templates/assets/prod/temporary/13/user_1339200.png<br>
                                        ls: cannot access
                                        templates/assets/prod/temporary/13/user_1339200.png:<br>
                                        Input/output error<br>
                                        <br>
                                        <br>
                                      </blockquote>
                                           =====-----=====-----=====<br>
                                      Notice: The information contained
                                      in this e-mail<br>
                                      message and/or attachments to it
                                      may contain<br>
                                      confidential or privileged
                                      information. If you are<br>
                                      not the intended recipient, any
                                      dissemination, use,<br>
                                      review, distribution, printing or
                                      copying of the<br>
                                      information contained in this
                                      e-mail message<br>
                                      and/or attachments to it are
                                      strictly prohibited. If<br>
                                      you have received this
                                      communication in error,<br>
                                      please notify us by reply e-mail
                                      or telephone and<br>
                                      immediately and permanently delete
                                      the message<br>
                                      and any attachments. Thank you<br>
                                      <br>
                                      <br>
                                      <br>
                                    </div>
                                  </div>
                                  <span>
                                    _______________________________________________<br>
                                    Gluster-users mailing list<br>
                                    <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
                                    <a href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
                                  </span></blockquote>
                                <div>
                                  <div> <br>
_______________________________________________<br>
                                    Gluster-users mailing list<br>
                                    <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
                                    <a href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
                                  </div>
                                </div>
                              </blockquote>
                            </div>
                            <br>
                            <br clear="all">
                            <div><br>
                            </div>
                            -- <br>
                            <div>
                              <div dir="ltr">
                                <div>
                                  <div dir="ltr"><font color="#444444"><b>Tiago
                                        Santos</b></font>
                                    <div>
                                      <div><font color="#ff0000">MustHaveMenus.com</font></div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        <br>
                      </div>
                    </div>
                  </div>
                </blockquote>
              </div>
              <br>
              <br clear="all">
              <div><br>
              </div>
              -- <br>
              <div>
                <div dir="ltr">
                  <div>
                    <div dir="ltr"><font color="#444444"><b>Tiago Santos</b></font>
                      <div>
                        <div><font color="#ff0000">MustHaveMenus.com</font></div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><font color="#444444"><b>Tiago Santos</b></font><div><div><font color="#ff0000">MustHaveMenus.com</font></div></div></div></div></div></div>
</div>