<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 3, 2016 at 4:10 PM, Ravishankar N <span dir="ltr">&lt;<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div>
    <div>Hi,<span class=""><br>
      <br>
      On 03/03/2016 11:14 AM, ABHISHEK PALIWAL wrote:<br>
    </span></div>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>Hi Ravi,<br>
                    <br>
                  </div><span class="">
                  As I discussed earlier this issue, I investigated this
                  issue and find that healing is not triggered because
                  the &quot;gluster volume heal c_glusterfs info split-brain&quot;
                  command not showing any entries as a outcome of this
                  command even though the file in split brain case.<br>
                </span></div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Couple of observations from the &#39;commands_output&#39; file.<br>
    <br>
    getfattr -d -m . -e hex
    opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml<br>
    The afr xattrs do not indicate that the file is in split brain:<br>
    # file:
    opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml<br>
    trusted.afr.c_glusterfs-client-1=0x000000000000000000000000<br>
    trusted.afr.dirty=0x000000000000000000000000<br>
    trusted.bit-rot.version=0x000000000000000b56d6dd1d000ec7a9<br>
    trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae<br>
    <br>
    <br>
    <br>
    getfattr -d -m . -e hex
    opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml<br>
    trusted.afr.c_glusterfs-client-0=0x000000080000000000000000<br>
    trusted.afr.c_glusterfs-client-2=0x000000020000000000000000<br>
    trusted.afr.c_glusterfs-client-4=0x000000020000000000000000<br>
    trusted.afr.c_glusterfs-client-6=0x000000020000000000000000<br>
    trusted.afr.dirty=0x000000000000000000000000<br>
    trusted.bit-rot.version=0x000000000000000b56d6dcb7000c87e7<br>
    trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae<br>
    <br>
    1. There doesn&#39;t seem to be a split-brain going by the trusted.afr*
    xattrs.<br></div></blockquote><div><br></div><div>if it is not the split brain problem then how can I resolve this.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>
    2. You seem to have re-used the bricks from another volume/setup.
    For replica 2, only trusted.afr.c_glusterfs-client-0 and
    trusted.afr.c_glusterfs-client-1 must be present but I see 4 xattrs
    - client-0,2,4 and 6 <br></div></blockquote><div><br></div><div>could you please suggest why these entries are there because I am not able to find out scenario. I am rebooting the one board multiple times to reproduce the issue and after every reboot doing the remove-brick and add-brick on the same volume for the second board.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">
    3. On the rebooted node, do you have ssl enabled by any chance?
    There is a bug for &quot;Not able to fetch volfile&#39; when ssl is enabled:
    <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1258931" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1258931</a><br>
    <br>
    Btw, you for data and metadata split-brains you can use the gluster
    CLI 
    <a href="https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md" target="_blank">https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md</a>
    instead of modifying the file from the back end.<br></div></blockquote><div><br></div><div>But you are saying it is not split brain problem and even the split-brain command  is not showing any file so how can I find the bigger file in size. Also in my case the file size is fix 2MB it is overwritten every time.  <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">
    <br>
    -Ravi<div><div class="h5"><br>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>
                <div><br>
                </div>
                So, what I have done I manually deleted the gfid entry
                of that file from .glusterfs directory and follow the
                instruction mentioned in the following link to do heal<br>
                <br>
                <a href="https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md" target="_blank">https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md</a><br>
                <br>
              </div>
              and this works fine for me.<br>
              <br>
            </div>
            But my question is why the split-brain command not showing
            any file in output.<br>
            <br>
          </div>
          <div>Here I am attaching all the log which I get from the node
            for you and also the output of commands from both of the
            boards<br>
            <br>
          </div>
          <div>In this tar file two directories are present <br>
            <br>
          </div>
          <div>000300 - log for the board which is running continuously<br>
          </div>
          <div>002500-  log for the board which is rebooted <br>
            <br>
          </div>
          <div>I am waiting for your reply please help me out on this
            issue.<br>
            <br>
          </div>
          <div>Thanks in advanced.<br>
          </div>
          <div><br>
          </div>
          Regards,<br>
        </div>
        Abhishek<br>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Fri, Feb 26, 2016 at 1:21 PM,
          ABHISHEK PALIWAL <span dir="ltr">&lt;<a href="mailto:abhishpaliwal@gmail.com" target="_blank"></a><a href="mailto:abhishpaliwal@gmail.com" target="_blank">abhishpaliwal@gmail.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">
              <div class="gmail_extra">
                <div class="gmail_quote"><span>On Fri, Feb 26,
                    2016 at 10:28 AM, Ravishankar N <span dir="ltr">&lt;<a href="mailto:ravishankar@redhat.com" target="_blank"></a><a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;</span>
                    wrote:<br>
                  </span>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div text="#000000" bgcolor="#FFFFFF"><span>
                        <div>On 02/26/2016 10:10 AM, ABHISHEK PALIWAL
                          wrote:<br>
                        </div>
                        <blockquote type="cite">
                          <p dir="ltr">Yes correct</p>
                        </blockquote>
                        <br>
                        Okay, so when you say the files are not in sync
                        until some time, are you getting stale data when
                        accessing from the mount?<br>
                        I&#39;m not able to figure out why heal info shows
                        zero when the files are not in sync, despite all
                        IO happening from the mounts. Could you provide
                        the output of getfattr -d -m . -e hex
                        /brick/file-name from both bricks when you hit
                        this issue?</span>
                      <div>
                        <div><br>
                        </div>
                        <div>I&#39;ll provide the logs once I get. here
                          delay means we are powering on the second
                          board after the 10 minutes.<br>
                        </div>
                        <div>
                          <div>
                            <div> <br>
                              <br>
                              <blockquote type="cite">
                                <div class="gmail_quote">On Feb 26, 2016
                                  9:57 AM, &quot;Ravishankar N&quot; &lt;<a href="mailto:ravishankar@redhat.com" target="_blank"></a><a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;

                                  wrote:<br type="attribution">
                                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                    <div text="#000000" bgcolor="#FFFFFF">
                                      <div>Hello,<br>
                                        <br>
                                        On 02/26/2016 08:29 AM, ABHISHEK
                                        PALIWAL wrote:<br>
                                      </div>
                                      <blockquote type="cite">
                                        <div dir="ltr">
                                          <div>
                                            <div>
                                              <div>
                                                <div>
                                                  <div>
                                                    <div>
                                                      <div>Hi Ravi,<br>
                                                        <br>
                                                      </div>
                                                      Thanks for the
                                                      response.<br>
                                                      <br>
                                                    </div>
                                                    We are using
                                                    Glugsterfs-3.7.8<br>
                                                    <br>
                                                    Here is the use
                                                    case:<br>
                                                    <br>
                                                    <span style="color:rgb(0,0,0)">We
                                                      have a logging
                                                      file which saves
                                                      logs of the events
                                                      for every board of
                                                      a node and these
                                                      files are in sync
                                                      using glusterfs.
                                                      System in replica
                                                      2 mode it means <span>When
                                                        one brick in a
                                                        replicated
                                                        volume goes
                                                        offline, the
                                                        glusterd daemons
                                                        on the other
                                                        nodes keep track
                                                        of all the files
                                                        that are not
                                                        replicated to
                                                        the offline
                                                        brick. When the
                                                        offline brick
                                                        becomes
                                                        available again,
                                                        the cluster
                                                        initiates a
                                                        healing process,
                                                        replicating the
                                                        updated files to
                                                        that brick. </span>But
                                                      in our casse, we
                                                      see that log file
                                                      of one board is
                                                      not in the sync
                                                      and its format is
                                                      corrupted means
                                                      files are not in
                                                      sync.</span><br>
                                                  </div>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </blockquote>
                                      <br>
                                      Just to understand you correctly,
                                      you have mounted the 2 node
                                      replica-2 volume on both these
                                      nodes and writing to a logging
                                      file from the mounts right? <br>
                                      <br>
                                      <blockquote type="cite">
                                        <div dir="ltr">
                                          <div>
                                            <div>
                                              <div>
                                                <div>
                                                  <div><br>
                                                  </div>
                                                  Even the outcome of <span><span>#gluster
                                                      volume heal
                                                      c_glusterfs info
                                                      shows that there
                                                      is no pending
                                                      heals.<br>
                                                      <br>
                                                    </span></span><span><span>Also
                                                      , The logging file
                                                      which is updated
                                                      is of fixed size
                                                      and the new
                                                      entries will be
                                                      wrapped
                                                      ,overwriting the
                                                      old entries.<br>
                                                      <br>
                                                      This way we have
                                                      seen that after
                                                      few restarts , the
                                                      contents of the
                                                      same file on two
                                                      bricks are
                                                      different , but
                                                      the volume heal
                                                      info shows zero
                                                      entries<br>
                                                      <br>
                                                    </span></span></div>
                                                <span><span>Solution:<br>
                                                    <br>
                                                  </span></span></div>
                                              <span><span>But when we
                                                  tried to put delay </span></span><span><span><span><span>
                                                      &gt; 5 min</span></span>
                                                  before the healing
                                                  everything is working
                                                  fine.<br>
                                                  <br>
                                                </span></span></div>
                                            <span><span>Regards,<br>
                                              </span></span></div>
                                          <span><span>Abhishek<br>
                                            </span></span> </div>
                                        <div class="gmail_extra"><br>
                                          <div class="gmail_quote">On
                                            Fri, Feb 26, 2016 at 6:35
                                            AM, Ravishankar N <span dir="ltr">&lt;<a href="mailto:ravishankar@redhat.com" target="_blank"></a><a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;</span>
                                            wrote:<br>
                                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                              <div text="#000000" bgcolor="#FFFFFF"><span>
                                                  <div>On 02/25/2016
                                                    06:01 PM, ABHISHEK
                                                    PALIWAL wrote:<br>
                                                  </div>
                                                  <blockquote type="cite">
                                                    <div dir="ltr">
                                                      <div>
                                                        <div>
                                                          <div>
                                                          <div>Hi,<br>
                                                          <br>
                                                          </div>
                                                          Here, I have
                                                          one query
                                                          regarding the
                                                          time taken by
                                                          the healing
                                                          process.<br>
                                                          </div>
                                                          In current two
                                                          node setup
                                                          when we
                                                          rebooted one
                                                          node then the
                                                          self-healing
                                                          process starts
                                                          less than 5min
                                                          interval on
                                                          the board
                                                          which
                                                          resulting the
                                                          corruption of
                                                          the some files
                                                          data.<br>
                                                        </div>
                                                      </div>
                                                    </div>
                                                  </blockquote>
                                                  <br>
                                                </span> Heal should
                                                start immediately after
                                                the brick process comes
                                                up. What version of
                                                gluster are you using?
                                                What do you mean by
                                                corruption of data?
                                                Also, how did you
                                                observe that the heal
                                                started after 5 minutes?<br>
                                                -Ravi<br>
                                                <blockquote type="cite"><span>
                                                    <div dir="ltr">
                                                      <div>
                                                        <div><br>
                                                        </div>
                                                        And to resolve
                                                        it I have search
                                                        on google and
                                                        found the
                                                        following link:<br>
                                                        <a href="https://support.rackspace.com/how-to/glusterfs-troubleshooting/" target="_blank"></a><a href="https://support.rackspace.com/how-to/glusterfs-troubleshooting/" target="_blank">https://support.rackspace.com/how-to/glusterfs-troubleshooting/</a><br>
                                                        <br>
                                                      </div>
                                                      <div>Mentioning
                                                        that the healing
                                                        process can
                                                        takes upto 10min
                                                        of time to start
                                                        this process.<br>
                                                        <br>
                                                      </div>
                                                      <div>Here is the
                                                        statement from
                                                        the link:<br>
                                                        <br>
                                                        &quot;Healing
                                                        replicated
                                                        volumes <br>
                                                        <br>
                                                        When any brick
                                                        in a replicated
                                                        volume goes
                                                        offline, the
                                                        glusterd daemons
                                                        on the remaining
                                                        nodes keep track
                                                        of all the files
                                                        that are not
                                                        replicated to
                                                        the offline
                                                        brick. When the
                                                        offline brick
                                                        becomes
                                                        available again,
                                                        the cluster
                                                        initiates a
                                                        healing process,
                                                        replicating the
                                                        updated files to
                                                        that brick. <b>The

                                                          start of this
                                                          process can
                                                          take up to 10
                                                          minutes, based
                                                          on
                                                          observation.</b>&quot;
                                                        <br>
                                                      </div>
                                                      <div><br>
                                                      </div>
                                                      <div>After giving
                                                        the time of more
                                                        than 5 min file
                                                        corruption
                                                        problem has been
                                                        resolved.<br>
                                                      </div>
                                                      <div><br>
                                                      </div>
                                                      <div>So, Here my
                                                        question is
                                                        there any way
                                                        through which we
                                                        can reduce the
                                                        time taken by
                                                        the healing
                                                        process to
                                                        start?<br>
                                                        <br>
                                                      </div>
                                                      <br>
                                                      Regards,<br>
                                                      Abhishek Paliwal<br clear="all">
                                                      <br>
                                                      <br>
                                                    </div>
                                                    <br>
                                                    <fieldset></fieldset>
                                                    <br>
                                                  </span>
                                                  <pre>_______________________________________________
Gluster-devel mailing list
<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a>
<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>
                                                </blockquote>
                                                <br>
                                                <br>
                                              </div>
                                            </blockquote>
                                          </div>
                                          <br>
                                          <br clear="all">
                                          <br>
                                          -- <br>
                                          <div>
                                            <div dir="ltr"><br>
                                              <br>
                                              <br>
                                              <br>
                                              Regards<br>
                                              Abhishek Paliwal<br>
                                            </div>
                                          </div>
                                        </div>
                                      </blockquote>
                                      <br>
                                      <br>
                                    </div>
                                  </blockquote>
                                </div>
                              </blockquote>
                              <br>
                              <br>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                    <span><font color="#888888">
                      </font></span></blockquote>
                </div>
                <span><font color="#888888"><br>
                    <br clear="all">
                    <br>
                    -- <br>
                    <div>
                      <div dir="ltr"><br>
                        <br>
                        <br>
                        <br>
                        Regards<br>
                        Abhishek Paliwal<br>
                      </div>
                    </div>
                  </font></span></div>
            </div>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <br>
        -- <br>
        <div>
          <div dir="ltr"><br>
            <br>
            <br>
            <br>
            Regards<br>
            Abhishek Paliwal<br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    <br>
  </div></div></div>

</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr"><br><br><br><br>Regards<br>
Abhishek Paliwal<br>
</div></div>
</div></div>