<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <br>
    <div class="moz-cite-prefix">On 01/28/2016 07:48 PM, David Robinson
      wrote:<br>
    </div>
    <blockquote
      cite="mid:em4861e585-7077-4a1b-ba30-04c5586f373d@dfrobins-vaio"
      type="cite">
      <style id="eMClientCss">BLOCKQUOTE.cite2 {
        MARGIN-TOP: 3px; PADDING-TOP: 0px; PADDING-LEFT: 10px; MARGIN-LEFT: 5px; BORDER-LEFT: #cccccc 1px solid; PADDING-RIGHT: 0px; MARGIN-RIGHT: 0px
}
.plain PRE {
        FONT-SIZE: 100%; FONT-FAMILY: monospace; FONT-WEIGHT: normal; FONT-STYLE: normal
}
.plain TT {
        FONT-SIZE: 100%; FONT-FAMILY: monospace; FONT-WEIGHT: normal; FONT-STYLE: normal
}
A IMG {
        BORDER-TOP: 0px; BORDER-RIGHT: 0px; BORDER-BOTTOM: 0px; BORDER-LEFT: 0px
}
#x6cc52c6e21704104b7b68438036271f9 {
        FONT-SIZE: 12pt; FONT-FAMILY: Times New Roman
}
#xdf504d15dd134271ad3b3d25408950b3 {
        FONT-SIZE: 12pt; FONT-FAMILY: Times New Roman
}
.plain PRE {
        FONT-SIZE: 12pt; FONT-FAMILY: Times New Roman
}
.plain TT {
        FONT-SIZE: 12pt; FONT-FAMILY: Times New Roman
}
BODY {
        FONT-SIZE: 12pt; FONT-FAMILY: Times New Roman
}
#x6cc52c6e21704104b7b68438036271f9 BLOCKQUOTE.cite {
        PADDING-LEFT: 10px; MARGIN-LEFT: 5px; BORDER-LEFT: #cccccc 1px solid; PADDING-RIGHT: 0px; MARGIN-RIGHT: 0px
}
</style>
      <style>#xdf504d15dd134271ad3b3d25408950b3 BLOCKQUOTE.cite2
{MARGIN-TOP: 3px; PADDING-TOP: 0px; PADDING-LEFT: 10px; MARGIN-LEFT: 5px; BORDER-LEFT: #cccccc 1px solid; PADDING-RIGHT: 0px; MARGIN-RIGHT: 0px}
#xdf504d15dd134271ad3b3d25408950b3 .plain PRE, #xdf504d15dd134271ad3b3d25408950b3 .plain TT
{FONT-SIZE: 100%; FONT-FAMILY: monospace; FONT-WEIGHT: normal; FONT-STYLE: normal}
#xdf504d15dd134271ad3b3d25408950b3 A IMG
{BORDER-TOP: 0px; BORDER-RIGHT: 0px; BORDER-BOTTOM: 0px; BORDER-LEFT: 0px}
#xdf504d15dd134271ad3b3d25408950b3 .plain PRE, #xdf504d15dd134271ad3b3d25408950b3 .plain TT, #xdf504d15dd134271ad3b3d25408950b3
{FONT-SIZE: 12pt; FONT-FAMILY: Times New Roman}
</style>
      <div><span id="x6cc52c6e21704104b7b68438036271f9"
          style="BACKGROUND-COLOR: #ffffff">&gt; Something really bad
          related to locks is happening. Did you guys patch the recent
          memory corruption bug which only affects workloads with more
          than 128 clients? <a moz-do-not-send="true"
            class="moz-txt-link-freetext"
            href="http://review.gluster.org/13241">&gt;
            http://review.gluster.org/13241</a></span></div>
      <div> </div>
      <div>We have not applied that patch.  Will this be included in the
        3.6.7 release?  If so, do you know when that version will be
        released?</div>
    </blockquote>
    + Raghavendra Bhat<br>
    <br>
    Could you please let David know about next release date?<br>
    <br>
    <blockquote
      cite="mid:em4861e585-7077-4a1b-ba30-04c5586f373d@dfrobins-vaio"
      type="cite">
      <div> </div>
      <div>David</div>
      <div> </div>
      <div> </div>
      <div>------ Original Message ------</div>
      <div>From: "Pranith Kumar Karampuri" &lt;<a moz-do-not-send="true"
          href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>&gt;</div>
      <div>To: "David Robinson" &lt;<a moz-do-not-send="true"
          href="mailto:drobinson@corvidtec.com">drobinson@corvidtec.com</a>&gt;;
        "Glomski, Patrick" &lt;<a moz-do-not-send="true"
          href="mailto:patrick.glomski@corvidtec.com">patrick.glomski@corvidtec.com</a>&gt;</div>
      <div>Cc: <a class="moz-txt-link-rfc2396E" href="mailto:gluster-users@gluster.org">"gluster-users@gluster.org"</a> &lt;<a moz-do-not-send="true"
          href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>&gt;;
        "Gluster Devel" &lt;<a moz-do-not-send="true"
          href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>&gt;</div>
      <div>Sent: 1/28/2016 5:10:07 AM</div>
      <div>Subject: Re: [Gluster-users] [Gluster-devel] heal hanging</div>
      <div> </div>
      <div id="xdf504d15dd134271ad3b3d25408950b3" style="COLOR: #000000">
        <blockquote class="cite2" cite="56A9E8FF.9060202@redhat.com"
          type="cite"><br>
          <br>
          <div class="moz-cite-prefix">On 01/25/2016 11:10 PM, David
            Robinson wrote:<br>
          </div>
          <blockquote class="cite"
            cite="mid:em5dec06c0-891f-4608-8ea4-49b95b22d6c6@dfrobins-vaio"
            type="cite">
            <style></style>
            <div>It is doing it again... statedump from gfs02a is
              attached...</div>
          </blockquote>
          <br>
          David,<br>
                 I see a lot of traffic from [f]inodelks:<br>
          15:09:00 :) ⚡ grep wind_from
          data-brick02a-homegfs.4066.dump.1453742225 | sort | uniq -c<br>
               11 unwind_from=default_finodelk_cbk<br>
               11 unwind_from=io_stats_finodelk_cbk<br>
               11 unwind_from=pl_common_inodelk<br>
             1133 wind_from=default_finodelk_resume<br>
                1 wind_from=default_inodelk_resume<br>
               75 wind_from=index_getxattr<br>
                6 wind_from=io_stats_entrylk<br>
            12776 wind_from=io_stats_finodelk<br>
               15 wind_from=io_stats_flush<br>
               75 wind_from=io_stats_getxattr<br>
                4 wind_from=io_stats_inodelk<br>
                4 wind_from=io_stats_lk<br>
                4 wind_from=io_stats_setattr<br>
               75 wind_from=marker_getxattr<br>
                4 wind_from=marker_setattr<br>
               75 wind_from=quota_getxattr<br>
                6 wind_from=server_entrylk_resume<br>
            12776 wind_from=server_finodelk_resume
          &lt;&lt;--------------<br>
               15 wind_from=server_flush_resume<br>
               75 wind_from=server_getxattr_resume<br>
                4 wind_from=server_inodelk_resume<br>
                4 wind_from=server_lk_resume<br>
                4 wind_from=server_setattr_resume<br>
          <br>
          But very less number of active locks:<br>
          pk1@localhost - ~/Downloads <br>
          15:09:07 :) ⚡ grep ACTIVE
          data-brick02a-homegfs.4066.dump.1453742225<br>
          inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
          start=9223372036854775806, len=0, pid = 11678,
          owner=b42fff03ce7f0000, client=0x13d2cd0,
          connection-id=corvidpost3.corvidtec.com-52656-2016/01/22-16:40:31:459920-homegfs-client-6-0-1,
          granted at 2016-01-25 17:16:06<br>
          inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
          len=0, pid = 15759, owner=b8ca8c0100000000, client=0x189e470,
          connection-id=corvidpost4.corvidtec.com-17718-2016/01/22-16:40:31:221380-homegfs-client-6-0-1,
          granted at 2016-01-25 17:12:52<br>
          inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
          start=9223372036854775806, len=0, pid = 7103,
          owner=0cf31a98f87f0000, client=0x2201d60,
          connection-id=zlv-bangell-4812-2016/01/25-13:45:52:170157-homegfs-client-6-0-0,
          granted at 2016-01-25 17:09:56<br>
          inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
          start=9223372036854775806, len=0, pid = 55764,
          owner=882dbea1417f0000, client=0x17fc940,
          connection-id=corvidpost.corvidtec.com-35961-2016/01/22-16:40:31:88946-homegfs-client-6-0-1,
          granted at 2016-01-25 17:06:12<br>
          inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
          start=9223372036854775806, len=0, pid = 21129,
          owner=3cc068a1e07f0000, client=0x1495040,
          connection-id=corvidpost2.corvidtec.com-43400-2016/01/22-16:40:31:248771-homegfs-client-6-0-1,
          granted at 2016-01-25 17:15:53<br>
          <br>
          One more odd thing I found is the following:<br>
          <br>
          [2016-01-15 14:03:06.910687] C
          [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired]
          0-homegfs-client-2: server 10.200.70.1:49153 has not responded
          in the last 10 seconds, disconnecting.<br>
          [2016-01-15 14:03:06.910886] E
          [rpc-clnt.c:362:saved_frames_unwind] (--&gt;
          /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x2b74c289a580]
          (--&gt;
          /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x2b74c2b27787]
          (--&gt;
          /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x2b74c2b2789e]
          (--&gt;
          /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x2b74c2b27951]
          (--&gt;
          /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x2b74c2b27f1f]
          ))))) 0-homegfs-client-2: forced unwinding frame
          type(GlusterFS 3.3) op(FINODELK(30)) called at 2016-01-15
          10:30:09.487422 (xid=0x11ed3f)<br>
          <br>
          FINODELK is called at 2016-01-15 10:30:09.487422 but the
          response still didn't come till 14:03:06. That is almost 3.5
          hours!!<br>
          <br>
          Something really bad related to locks is happening. Did you
          guys patch the recent memory corruption bug which only affects
          workloads with more than 128 clients? <a
            moz-do-not-send="true" class="moz-txt-link-freetext"
            href="http://review.gluster.org/13241">http://review.gluster.org/13241</a><br>
          <br>
          Pranith
          <blockquote class="cite"
            cite="mid:em5dec06c0-891f-4608-8ea4-49b95b22d6c6@dfrobins-vaio"
            type="cite">
            <div> </div>
            <div> </div>
            <div> </div>
            <div>------ Original Message ------</div>
            <div>From: "Pranith Kumar Karampuri" &lt;<a
                href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</div>
            <div>To: "Glomski, Patrick" &lt;<a
                href="mailto:patrick.glomski@corvidtec.com"
                moz-do-not-send="true">patrick.glomski@corvidtec.com</a>&gt;</div>
            <div>Cc: "David Robinson" &lt;<a
                href="mailto:drobinson@corvidtec.com"
                moz-do-not-send="true">drobinson@corvidtec.com</a>&gt;;
              <a moz-do-not-send="true" class="moz-txt-link-rfc2396E"
                href="mailto:gluster-users@gluster.org">mailto:gluster-users@gluster.org</a>
              &lt;<a href="mailto:gluster-users@gluster.org"
                moz-do-not-send="true">gluster-users@gluster.org</a>&gt;;
              "Gluster Devel" &lt;<a
                href="mailto:gluster-devel@gluster.org"
                moz-do-not-send="true">gluster-devel@gluster.org</a>&gt;</div>
            <div>Sent: 1/24/2016 9:27:02 PM</div>
            <div>Subject: Re: [Gluster-users] [Gluster-devel] heal
              hanging</div>
            <div> </div>
            <div id="xbb82614cb18e449189357d4ae81dda55" style="COLOR:
              #000000">
              <blockquote class="cite2"
                cite="56A587F6.8050805@redhat.com" type="cite">It seems
                like there is a lot of finodelk/inodelk traffic. I
                wonder why that is. I think the next steps is to collect
                statedump of the brick which is taking lot of CPU, using
                "gluster volume statedump &lt;volname&gt;"<br>
                <br>
                Pranith<br>
                <div class="moz-cite-prefix">On 01/22/2016 08:36 AM,
                  Glomski, Patrick wrote:<br>
                </div>
                <blockquote class="cite"
cite="mid:CALkMjdCZRYOvhNGOrCFS9v6Y-vOhX2do0HA-N=CpMf1OBo4+dg@mail.gmail.com"
                  type="cite">
                  <div dir="ltr">
                    <div>Pranith, attached are stack traces collected
                      every second for 20 seconds from the high-%cpu
                      glusterfsd process.<br>
                      <br>
                    </div>
                    Patrick<br>
                  </div>
                  <div class="gmail_extra"><br>
                    <div class="gmail_quote">On Thu, Jan 21, 2016 at
                      9:46 PM, Glomski, Patrick <span dir="ltr">&lt;<a
                          href="mailto:patrick.glomski@corvidtec.com"
                          moz-do-not-send="true">patrick.glomski@corvidtec.com</a>&gt;</span>
                      wrote:<br>
                      <blockquote class="gmail_quote"
                        style="PADDING-LEFT: 1ex; BORDER-LEFT: #ccc 1px
                        solid; MARGIN: 0px 0px 0px 0.8ex">
                        <div dir="ltr">
                          <div>Last entry for get_real_filename on any
                            of the bricks was when we turned off the
                            samba gfapi vfs plugin earlier today:<br>
                            <br>
                            /var/log/glusterfs/bricks/data-brick01a-homegfs.log:[2016-01-21
                            15:13:00.008239] E
                            [server-rpc-fops.c:768:server_getxattr_cbk]
                            0-homegfs-server: 105: GETXATTR /wks_backup
                            (40e582d6-b0c7-4099-ba88-9168a3c32ca6)
                            (glusterfs.get_real_filename:desktop.ini)
                            ==&gt; (Permission denied)<br>
                            <br>
                          </div>
                          We'll get back to you with those traces when
                          %cpu spikes again. As with most sporadic
                          problems, as soon as you want something out of
                          it, the issue becomes harder to reproduce.<br>
                          <div>
                            <div><br>
                            </div>
                          </div>
                        </div>
                        <div class="HOEnZb">
                          <div class="h5">
                            <div class="gmail_extra"><br>
                              <div class="gmail_quote">On Thu, Jan 21,
                                2016 at 9:21 PM, Pranith Kumar Karampuri
                                <span dir="ltr">&lt;<a
                                    href="mailto:pkarampu@redhat.com"
                                    moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>
                                wrote:<br>
                                <blockquote class="gmail_quote"
                                  style="PADDING-LEFT: 1ex; BORDER-LEFT:
                                  #ccc 1px solid; MARGIN: 0px 0px 0px
                                  0.8ex">
                                  <div bgcolor="#FFFFFF" text="#000000"><span><br>
                                      <br>
                                      <div>On 01/22/2016 07:25 AM,
                                        Glomski, Patrick wrote:<br>
                                      </div>
                                    </span><span>
                                      <blockquote class="cite"
                                        type="cite">
                                        <div dir="ltr">Unfortunately,
                                          all samba mounts to the
                                          gluster volume through the
                                          gfapi vfs plugin have been
                                          disabled for the last 6 hours
                                          or so and frequency of %cpu
                                          spikes is increased. We had
                                          switched to sharing a fuse
                                          mount through samba, but I
                                          just disabled that as well.
                                          There are no samba shares of
                                          this volume now. The spikes
                                          now happen every thirty
                                          minutes or so. We've resorted
                                          to just rebooting the machine
                                          with high load for the
                                          present.<br>
                                        </div>
                                      </blockquote>
                                      <br>
                                    </span>Could you see if the logs of
                                    following type are not at all
                                    coming?<br>
                                    [2016-01-21 15:13:00.005736] E
                                    [server-rpc-fops.c:768:server_getxattr_cbk]
                                    0-homegfs-server: 110: GETXATTR
                                    /wks_backup
                                    (40e582d6-b0c7-4099-ba88-9168a3c<br>
                                    32ca6)
                                    (glusterfs.get_real_filename:desktop.ini)
                                    ==&gt; (Permission denied)<br>
                                    <br>
                                    These are operations that failed.
                                    Operations that succeed are the ones
                                    that will scan the directory. But I
                                    don't have a way to find them other
                                    than using tcpdumps.<br>
                                    <br>
                                    At the moment I have 2 theories:<br>
                                    1) these get_real_filename calls<br>
                                    2) [2016-01-21 16:10:38.017828] E
                                    [server-helpers.c:46:gid_resolve]
                                    0-gid-cache: getpwuid_r(494) failed<br>
                                    "<br>
                                    <p class="MsoNormal"><span
                                        style="FONT-SIZE: 11pt;
                                        FONT-FAMILY:
                                        &quot;Calibri&quot;,&quot;sans-serif&quot;;
                                        COLOR: #1f497d">Yessir they
                                        are.  Normally, sssd would look
                                        to the local cache file in
                                        /var/lib/sss/db/ first, to get
                                        any group or userid information,
                                        then go out to the domain
                                        controller.  I put the options
                                        that we are using on our GFS
                                        volumes below…  Thanks for your
                                        help.</span></p>
                                    <p class="MsoNormal"><span
                                        style="FONT-SIZE: 11pt;
                                        FONT-FAMILY:
                                        &quot;Calibri&quot;,&quot;sans-serif&quot;;
                                        COLOR: #1f497d"> </span></p>
                                    <p class="MsoNormal"><span
                                        style="FONT-SIZE: 11pt;
                                        FONT-FAMILY:
                                        &quot;Calibri&quot;,&quot;sans-serif&quot;;
                                        COLOR: #1f497d">We had been
                                        running sssd with sssd_nss and
                                        sssd_be sub-processes on these
                                        systems for a long time, under
                                        the GFS 3.5.2 code, and not run
                                        into the problem that David
                                        described with the high cpu
                                        usage on sssd_nss.</span></p>
                                    <b><span>"<br>
                                      </span></b>That was Tom Young's
                                    email 1.5 years back when we
                                    debugged it. But the process which
                                    was consuming lot of cpu is
                                    sssd_nss. So I am not sure if it is
                                    same issue. Let us debug to see '1)'
                                    doesn't happen. The gstack traces I
                                    asked for should also help.
                                    <div>
                                      <div><br>
                                        <br>
                                        Pranith<br>
                                        <blockquote class="cite"
                                          type="cite">
                                          <div class="gmail_extra"><br>
                                            <div class="gmail_quote">On
                                              Thu, Jan 21, 2016 at 8:49
                                              PM, Pranith Kumar
                                              Karampuri <span dir="ltr">&lt;<a
href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>
                                              wrote:<br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="PADDING-LEFT:
                                                1ex; BORDER-LEFT: #ccc
                                                1px solid; MARGIN: 0px
                                                0px 0px 0.8ex">
                                                <div bgcolor="#FFFFFF"
                                                  text="#000000"><span><br>
                                                    <br>
                                                    <div>On 01/22/2016
                                                      07:13 AM, Glomski,
                                                      Patrick wrote:<br>
                                                    </div>
                                                    <blockquote
                                                      class="cite"
                                                      type="cite">
                                                      <div dir="ltr">We
                                                        use the samba
                                                        glusterfs
                                                        virtual
                                                        filesystem (the
                                                        current version
                                                        provided on <a
href="http://download.gluster.org/" moz-do-not-send="true">download.gluster.org</a>),
                                                        but no windows
                                                        clients
                                                        connecting
                                                        directly.<br>
                                                      </div>
                                                    </blockquote>
                                                    <br>
                                                  </span>Hmm.. Is there
                                                  a way to disable using
                                                  this and check if the
                                                  CPU% still increases?
                                                  What getxattr of
                                                  "glusterfs.get_real_filename
                                                  &lt;filanme&gt;" does
                                                  is to scan the entire
                                                  directory looking for
                                                  strcasecmp(&lt;filname&gt;,
                                                  &lt;scanned-filename&gt;).
                                                  If anything matches
                                                  then it will return
                                                  the
                                                  &lt;scanned-filename&gt;.
                                                  But the problem is the
                                                  scan is costly. So I
                                                  wonder if this is the
                                                  reason for the CPU
                                                  spikes.<span><font
                                                      color="#888888"><br>
                                                      <br>
                                                      Pranith</font></span>
                                                  <div>
                                                    <div><br>
                                                      <blockquote
                                                        class="cite"
                                                        type="cite">
                                                        <div
                                                          class="gmail_extra"><br>
                                                          <div
                                                          class="gmail_quote">On
                                                          Thu, Jan 21,
                                                          2016 at 8:37
                                                          PM, Pranith
                                                          Kumar
                                                          Karampuri <span
                                                          dir="ltr">&lt;<a
href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>
                                                          wrote:<br>
                                                          <blockquote
                                                          class="gmail_quote"
                                                          style="PADDING-LEFT:
                                                          1ex;
                                                          BORDER-LEFT:
                                                          #ccc 1px
                                                          solid; MARGIN:
                                                          0px 0px 0px
                                                          0.8ex">
                                                          <div
                                                          bgcolor="#FFFFFF"
                                                          text="#000000">Do
                                                          you have any
                                                          windows
                                                          clients? I see
                                                          a lot of
                                                          getxattr calls
                                                          for
                                                          "glusterfs.get_real_filename"
                                                          which lead to
                                                          full readdirs
                                                          of the
                                                          directories on
                                                          the brick.<span><font
color="#888888"><br>
                                                          <br>
                                                          Pranith</font></span><span><br>
                                                          <br>
                                                          <div>On
                                                          01/22/2016
                                                          12:51 AM,
                                                          Glomski,
                                                          Patrick wrote:<br>
                                                          </div>
                                                          </span>
                                                          <div>
                                                          <div>
                                                          <blockquote
                                                          class="cite"
                                                          type="cite">
                                                          <div dir="ltr">
                                                          <div>Pranith,
                                                          could this
                                                          kind of
                                                          behavior be
                                                          self-inflicted
                                                          by us deleting
                                                          files directly
                                                          from the
                                                          bricks? We
                                                          have done that
                                                          in the past to
                                                          clean up an
                                                          issues where
                                                          gluster
                                                          wouldn't allow
                                                          us to delete
                                                          from the
                                                          mount.<br>
                                                          <br>
                                                          If so, is it
                                                          feasible to
                                                          clean them up
                                                          by running a
                                                          search on the
                                                          .glusterfs
                                                          directories
                                                          directly and
                                                          removing files
                                                          with a
                                                          reference
                                                          count of 1
                                                          that are
                                                          non-zero size
                                                          (or directly
                                                          checking the
                                                          xattrs to be
                                                          sure that it's
                                                          not a DHT
                                                          link). <br>
                                                          <br>
                                                          find
                                                          /data/brick01a/homegfs/.glusterfs
                                                          -type f -not
                                                          -empty -links
                                                          -2 -exec rm -f
                                                          "{}" \;<br>
                                                          <br>
                                                          </div>
                                                          Is there
                                                          anything I'm
                                                          inherently
                                                          missing with
                                                          that approach
                                                          that will
                                                          further
                                                          corrupt the
                                                          system?<br>
                                                          <div><br>
                                                          </div>
                                                          </div>
                                                          <div
                                                          class="gmail_extra"><br>
                                                          <div
                                                          class="gmail_quote">On
                                                          Thu, Jan 21,
                                                          2016 at 1:02
                                                          PM, Glomski,
                                                          Patrick <span
                                                          dir="ltr">&lt;<a
href="mailto:patrick.glomski@corvidtec.com" moz-do-not-send="true">patrick.glomski@corvidtec.com</a>&gt;</span>
                                                          wrote:<br>
                                                          <blockquote
                                                          class="gmail_quote"
                                                          style="PADDING-LEFT:
                                                          1ex;
                                                          BORDER-LEFT:
                                                          #ccc 1px
                                                          solid; MARGIN:
                                                          0px 0px 0px
                                                          0.8ex">
                                                          <div dir="ltr">
                                                          <div>
                                                          <div>Load
                                                          spiked again:
                                                          ~1200%cpu on
                                                          gfs02a for
                                                          glusterfsd.
                                                          Crawl has been
                                                          running on one
                                                          of the bricks
                                                          on gfs02b for
                                                          25 min or so
                                                          and users
                                                          cannot access
                                                          the volume.<br>
                                                          <br>
                                                          I re-listed
                                                          the xattrop
                                                          directories as
                                                          well as a
                                                          'top' entry
                                                          and heal
                                                          statistics.
                                                          Then I
                                                          restarted the
                                                          gluster
                                                          services on
                                                          gfs02a. <br>
                                                          <br>
                                                          ===================
                                                          top
                                                          ===================<br>
                                                          PID USER     
                                                          PR  NI  VIRT 
                                                          RES  SHR S
                                                          %CPU %MEM   
                                                          TIME+ 
                                                          COMMAND                                                
                                                          <br>
                                                           8969
                                                          root      20  
                                                          0 2815m 204m
                                                          3588 S 1181.0 
                                                          0.6 591:06.93
                                                          glusterfsd        
                                                          <br>
                                                          <br>
                                                          ===================
                                                          xattrop
                                                          ===================<br>
/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>
                                                          xattrop-41f19453-91e4-437c-afa9-3b25614de210 
xattrop-9b815879-2f4d-402b-867c-a6d65087788c<br>
                                                          <br>
/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>
                                                          xattrop-70131855-3cfb-49af-abce-9d23f57fb393 
xattrop-dfb77848-a39d-4417-a725-9beca75d78c6<br>
                                                          <br>
/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>
                                                          e6e47ed9-309b-42a7-8c44-28c29b9a20f8         
xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125<br>
                                                          xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 
xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0<br>
                                                          <br>
/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>
                                                          xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc 
xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413<br>
                                                          <br>
/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531<br>
                                                          <br>
/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-7e20fdb1-5224-4b9a-be06-568708526d70<br>
                                                          <br>
/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>
                                                          8034bc06-92cd-4fa5-8aaf-09039e79d2c8 
c9ce22ed-6d8b-471b-a111-b39e57f0b512<br>
                                                          94fa1d60-45ad-4341-b69c-315936b51e8d 
xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7<br>
                                                          <br>
/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d<br>
                                                          <br>
                                                          <br>
                                                          ===================
                                                          heal stats
                                                          ===================<br>
                                                           <br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:36:45 2016<br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:36:45 2016<br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:36:19 2016<br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:36:19 2016<br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : No. of heal
                                                          failed
                                                          entries   : 1<br>
                                                           <br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:36:48 2016<br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:36:48 2016<br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:36:47 2016<br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:36:47 2016<br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:36:06 2016<br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:36:06 2016<br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:13:40 2016<br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          :                               
                                                          *** Crawl is
                                                          in progress
                                                          ***<br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:36:58 2016<br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:36:58 2016<br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:36:50 2016<br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:36:50 2016<br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                          <br>
                                                          <br>
========================================================================================<br>
                                                          </div>
                                                          I waited a few
                                                          minutes for
                                                          the heals to
                                                          finish and ran
                                                          the heal
                                                          statistics and
                                                          info again.
                                                          one file is in
                                                          split-brain.
                                                          Aside from the
                                                          split-brain,
                                                          the load on
                                                          all systems is
                                                          down now and
                                                          they are
                                                          behaving
                                                          normally.
                                                          glustershd.log
                                                          is attached.
                                                          What is going
                                                          on??? <br>
                                                          <br>
                                                          Thu Jan 21
                                                          12:53:50 EST
                                                          2016<br>
                                                           <br>
                                                          ===================
                                                          homegfs
                                                          ===================<br>
                                                           <br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:53:02 2016<br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:53:02 2016<br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b0-gfsib01a]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:53:38 2016<br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:53:38 2016<br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b1-gfsib01b]
                                                          : No. of heal
                                                          failed
                                                          entries   : 1<br>
                                                           <br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:53:04 2016<br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:53:04 2016<br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b2-gfsib01a]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:53:04 2016<br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:53:04 2016<br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b3-gfsib01b]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:53:33 2016<br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:53:33 2016<br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b4-gfsib02a]
                                                          : No. of heal
                                                          failed
                                                          entries   : 1<br>
                                                           <br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:53:14 2016<br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:53:15 2016<br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b5-gfsib02b]
                                                          : No. of heal
                                                          failed
                                                          entries   : 3<br>
                                                           <br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:53:04 2016<br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:53:04 2016<br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b6-gfsib02a]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : Starting
                                                          time of
                                                          crawl       :
                                                          Thu Jan 21
                                                          12:53:09 2016<br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : Ending time
                                                          of
                                                          crawl        
                                                          : Thu Jan 21
                                                          12:53:09 2016<br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : Type of
                                                          crawl: INDEX<br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : No. of
                                                          entries
                                                          healed       
                                                          : 0<br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : No. of
                                                          entries in
                                                          split-brain: 0<br>
                                                          homegfs
                                                          [b7-gfsib02b]
                                                          : No. of heal
                                                          failed
                                                          entries   : 0<br>
                                                           <br>
                                                          *** gluster
                                                          bug in
                                                          'gluster
                                                          volume heal
                                                          homegfs
                                                          statistics'  
                                                          ***<br>
                                                          *** Use
                                                          'gluster
                                                          volume heal
                                                          homegfs info'
                                                          until bug is
                                                          fixed ***<span><br>
                                                           <br>
                                                          Brick
                                                          gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>
                                                          </span>/users/bangell/.gconfd
                                                          - Is in
                                                          split-brain<br>
                                                          <br>
                                                          Number of
                                                          entries: 1<br>
                                                          <br>
                                                          Brick
                                                          gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>
                                                          /users/bangell/.gconfd
                                                          - Is in
                                                          split-brain<br>
                                                          <br>
                                                          /users/bangell/.gconfd/saved_state
                                                          <br>
                                                          Number of
                                                          entries: 2<span><br>
                                                          <br>
                                                          Brick
                                                          gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          </span></div>
                                                          <div><br>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          <div>
                                                          <div>
                                                          <div
                                                          class="gmail_extra"><br>
                                                          <div
                                                          class="gmail_quote">On
                                                          Thu, Jan 21,
                                                          2016 at 11:10
                                                          AM, Pranith
                                                          Kumar
                                                          Karampuri <span
                                                          dir="ltr">&lt;<a
href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>
                                                          wrote:<br>
                                                          <blockquote
                                                          class="gmail_quote"
                                                          style="PADDING-LEFT:
                                                          1ex;
                                                          BORDER-LEFT:
                                                          #ccc 1px
                                                          solid; MARGIN:
                                                          0px 0px 0px
                                                          0.8ex">
                                                          <div
                                                          bgcolor="#FFFFFF"
                                                          text="#000000"><span><br>
                                                          <br>
                                                          <div>On
                                                          01/21/2016
                                                          09:26 PM,
                                                          Glomski,
                                                          Patrick wrote:<br>
                                                          </div>
                                                          <blockquote
                                                          class="cite"
                                                          type="cite">
                                                          <div dir="ltr">
                                                          <div>I should
                                                          mention that
                                                          the problem is
                                                          not currently
                                                          occurring and
                                                          there are no
                                                          heals (output
                                                          appended). By
                                                          restarting the
                                                          gluster
                                                          services, we
                                                          can stop the
                                                          crawl, which
                                                          lowers the
                                                          load for a
                                                          while.
                                                          Subsequent
                                                          crawls seem to
                                                          finish
                                                          properly. For
                                                          what it's
                                                          worth,
                                                          files/folders
                                                          that show up
                                                          in the 'volume
                                                          info' output
                                                          during a hung
                                                          crawl don't
                                                          seem to be
                                                          anything out
                                                          of the
                                                          ordinary. <br>
                                                          <br>
                                                          Over the past
                                                          four days, the
                                                          typical time
                                                          before the
                                                          problem recurs
                                                          after
                                                          suppressing it
                                                          in this manner
                                                          is an hour.
                                                          Last night
                                                          when we
                                                          reached out to
                                                          you was the
                                                          last time it
                                                          happened and
                                                          the load has
                                                          been low since
                                                          (a relief). 
                                                          David believes
                                                          that
                                                          recursively
                                                          listing the
                                                          files (ls -alR
                                                          or similar)
                                                          from a client
                                                          mount can
                                                          force the
                                                          issue to
                                                          happen, but
                                                          obviously I'd
                                                          rather not
                                                          unless we have
                                                          some precise
                                                          thing we're
                                                          looking for.
                                                          Let me know if
                                                          you'd like me
                                                          to attempt to
                                                          drive the
                                                          system
                                                          unstable like
                                                          that and what
                                                          I should look
                                                          for. As it's a
                                                          production
                                                          system, I'd
                                                          rather not
                                                          leave it in
                                                          this state for
                                                          long.<br>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          <br>
                                                          </span>Will it
                                                          be possible to
                                                          send
                                                          glustershd,
                                                          mount logs of
                                                          the past 4
                                                          days? I would
                                                          like to see if
                                                          this is
                                                          because of
                                                          directory
                                                          self-heal
                                                          going wild
                                                          (Ravi is
                                                          working on
                                                          throttling
                                                          feature for
                                                          3.8, which
                                                          will allow to
                                                          put breaks on
                                                          self-heal
                                                          traffic)<span><font
color="#888888"><br>
                                                          <br>
                                                          Pranith</font></span>
                                                          <div>
                                                          <div><br>
                                                          <blockquote
                                                          class="cite"
                                                          type="cite">
                                                          <div dir="ltr">
                                                          <div><br>
                                                          </div>
                                                          <div>[root@gfs01a
                                                          xattrop]#
                                                          gluster volume
                                                          heal homegfs
                                                          info<br>
                                                          Brick
                                                          gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          <br>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          <div
                                                          class="gmail_extra"><br>
                                                          <div
                                                          class="gmail_quote">On
                                                          Thu, Jan 21,
                                                          2016 at 10:40
                                                          AM, Pranith
                                                          Kumar
                                                          Karampuri <span
                                                          dir="ltr">&lt;<a
href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>
                                                          wrote:<br>
                                                          <blockquote
                                                          class="gmail_quote"
                                                          style="PADDING-LEFT:
                                                          1ex;
                                                          BORDER-LEFT:
                                                          #ccc 1px
                                                          solid; MARGIN:
                                                          0px 0px 0px
                                                          0.8ex">
                                                          <div
                                                          bgcolor="#FFFFFF"
                                                          text="#000000"><span><br>
                                                          <br>
                                                          <div>On
                                                          01/21/2016
                                                          08:25 PM,
                                                          Glomski,
                                                          Patrick wrote:<br>
                                                          </div>
                                                          <blockquote
                                                          class="cite"
                                                          type="cite">
                                                          <div dir="ltr">
                                                          <div>Hello,
                                                          Pranith. The
                                                          typical
                                                          behavior is
                                                          that the %cpu
                                                          on a
                                                          glusterfsd
                                                          process jumps
                                                          to number of
                                                          processor
                                                          cores
                                                          available
                                                          (800% or
                                                          1200%,
                                                          depending on
                                                          the pair of
                                                          nodes
                                                          involved) and
                                                          the load
                                                          average on the
                                                          machine goes
                                                          very high
                                                          (~20). The
                                                          volume's heal
                                                          statistics
                                                          output shows
                                                          that it is
                                                          crawling one
                                                          of the bricks
                                                          and trying to
                                                          heal, but this
                                                          crawl hangs
                                                          and never
                                                          seems to
                                                          finish.<br>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          <blockquote
                                                          class="cite"
                                                          type="cite">
                                                          <div dir="ltr">
                                                          <div><br>
                                                          </div>
                                                          The number of
                                                          files in the
                                                          xattrop
                                                          directory
                                                          varies over
                                                          time, so I ran
                                                          a wc -l as you
                                                          requested
                                                          periodically
                                                          for some time
                                                          and then
                                                          started
                                                          including a
                                                          datestamped
                                                          list of the
                                                          files that
                                                          were in the
                                                          xattrops
                                                          directory on
                                                          each brick to
                                                          see which were
                                                          persistent.
                                                          All bricks had
                                                          files in the
                                                          xattrop
                                                          folder, so all
                                                          results are
                                                          attached.<br>
                                                          </div>
                                                          </blockquote>
                                                          </span>Thanks
                                                          this info is
                                                          helpful. I
                                                          don't see a
                                                          lot of files.
                                                          Could you give
                                                          output of
                                                          "gluster
                                                          volume heal
                                                          &lt;volname&gt;
                                                          info"? Is
                                                          there any
                                                          directory in
                                                          there which is
                                                          LARGE?<span><font
color="#888888"><br>
                                                          <br>
                                                          Pranith</font></span>
                                                          <div>
                                                          <div><br>
                                                          <blockquote
                                                          class="cite"
                                                          type="cite">
                                                          <div dir="ltr">
                                                          <div><br>
                                                          </div>
                                                          <div>Please
                                                          let me know if
                                                          there is
                                                          anything else
                                                          I can provide.<br>
                                                          </div>
                                                          <div><br>
                                                          </div>
                                                          <div>Patrick<br>
                                                          </div>
                                                          <div><br>
                                                          </div>
                                                          </div>
                                                          <div
                                                          class="gmail_extra"><br>
                                                          <div
                                                          class="gmail_quote">On
                                                          Thu, Jan 21,
                                                          2016 at 12:01
                                                          AM, Pranith
                                                          Kumar
                                                          Karampuri <span
                                                          dir="ltr">&lt;<a
href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>
                                                          wrote:<br>
                                                          <blockquote
                                                          class="gmail_quote"
                                                          style="PADDING-LEFT:
                                                          1ex;
                                                          BORDER-LEFT:
                                                          #ccc 1px
                                                          solid; MARGIN:
                                                          0px 0px 0px
                                                          0.8ex">
                                                          <div
                                                          bgcolor="#FFFFFF"
                                                          text="#000000">hey,<br>
                                                                 Which
                                                          process is
                                                          consuming so
                                                          much cpu? I
                                                          went through
                                                          the logs you
                                                          gave me. I see
                                                          that the
                                                          following
                                                          files are in
                                                          gfid mismatch
                                                          state:<br>
                                                          <br>
&lt;066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup&gt;,<br>
&lt;1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak&gt;,<br>
&lt;ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg&gt;,<br>
                                                          <br>
                                                          Could you give
                                                          me the output
                                                          of "ls
                                                          &lt;brick-path&gt;/indices/xattrop
                                                          | wc -l"
                                                          output on all
                                                          the bricks
                                                          which are
                                                          acting this
                                                          way? This will
                                                          tell us the
                                                          number of
                                                          pending
                                                          self-heals on
                                                          the system.<br>
                                                          <br>
                                                          Pranith
                                                          <div>
                                                          <div><br>
                                                          <br>
                                                          <div>On
                                                          01/20/2016
                                                          09:26 PM,
                                                          David Robinson
                                                          wrote:<br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          <blockquote
                                                          class="cite"
                                                          type="cite">
                                                          <div>
                                                          <div>
                                                          <div>resending
                                                          with parsed
                                                          logs... </div>
                                                          <div> </div>
                                                          <div>
                                                          <blockquote
                                                          class="cite"
                                                          cite="http://em5ee26b0e-002a-4230-bdec-3020b98cff3c@dfrobins-vaio"
                                                          type="cite">
                                                          <div> </div>
                                                          <div> </div>
                                                          <div>
                                                          <blockquote
                                                          class="cite"
                                                          cite="http://eme3b2cb80-8be2-4fa5-9d08-4710955e237c@dfrobins-vaio"
                                                          type="cite">
                                                          <div>I am
                                                          having issues
                                                          with 3.6.6
                                                          where the load
                                                          will spike up
                                                          to 800% for
                                                          one of the
                                                          glusterfsd
                                                          processes and
                                                          the users can
                                                          no longer
                                                          access the
                                                          system.  If I
                                                          reboot the
                                                          node, the heal
                                                          will finish
                                                          normally after
                                                          a few minutes
                                                          and the system
                                                          will be
                                                          responsive,
                                                          but a few
                                                          hours later
                                                          the issue will
                                                          start again. 
                                                          It look like
                                                          it is hanging
                                                          in a heal and
                                                          spinning up
                                                          the load on
                                                          one of the
                                                          bricks.  The
                                                          heal gets
                                                          stuck and says
                                                          it is crawling
                                                          and never
                                                          returns. 
                                                          After a few
                                                          minutes of the
                                                          heal saying it
                                                          is crawling,
                                                          the load
                                                          spikes up and
                                                          the mounts
                                                          become
                                                          unresponsive.</div>
                                                          <div> </div>
                                                          <div>Any
                                                          suggestions on
                                                          how to fix
                                                          this?  It has
                                                          us stopped
                                                          cold as the
                                                          user can no
                                                          longer access
                                                          the systems
                                                          when the load
                                                          spikes... Logs
                                                          attached.</div>
                                                          <div> </div>
                                                          <div>System
                                                          setup info is:
                                                          </div>
                                                          <div> </div>
                                                          <div>[root@gfs01a
                                                          ~]# gluster
                                                          volume info
                                                          homegfs<br>
                                                           <br>
                                                          Volume Name:
                                                          homegfs<br>
                                                          Type:
                                                          Distributed-Replicate<br>
                                                          Volume ID:
                                                          1e32672a-f1b7-4b58-ba94-58c085e59071<br>
                                                          Status:
                                                          Started<br>
                                                          Number of
                                                          Bricks: 4 x 2
                                                          = 8<br>
                                                          Transport-type:
                                                          tcp<br>
                                                          Bricks:<br>
                                                          Brick1:
                                                          gfsib01a.corvidtec.com:/data/brick01a/homegfs<br>
                                                          Brick2:
                                                          gfsib01b.corvidtec.com:/data/brick01b/homegfs<br>
                                                          Brick3:
                                                          gfsib01a.corvidtec.com:/data/brick02a/homegfs<br>
                                                          Brick4:
                                                          gfsib01b.corvidtec.com:/data/brick02b/homegfs<br>
                                                          Brick5:
                                                          gfsib02a.corvidtec.com:/data/brick01a/homegfs<br>
                                                          Brick6:
                                                          gfsib02b.corvidtec.com:/data/brick01b/homegfs<br>
                                                          Brick7:
                                                          gfsib02a.corvidtec.com:/data/brick02a/homegfs<br>
                                                          Brick8:
                                                          gfsib02b.corvidtec.com:/data/brick02b/homegfs<br>
                                                          Options
                                                          Reconfigured:<br>
                                                          performance.io-thread-count:
                                                          32<br>
                                                          performance.cache-size:
                                                          128MB<br>
                                                          performance.write-behind-window-size:
                                                          128MB<br>
                                                          server.allow-insecure:
                                                          on<br>
                                                          network.ping-timeout:
                                                          42<br>
                                                          storage.owner-gid:
                                                          100<br>
                                                          geo-replication.indexing:
                                                          off<br>
                                                          geo-replication.ignore-pid-check:
                                                          on<br>
                                                          changelog.changelog:
                                                          off<br>
                                                          changelog.fsync-interval:
                                                          3<br>
                                                          changelog.rollover-time:
                                                          15<br>
                                                          server.manage-gids:
                                                          on<br>
                                                          diagnostics.client-log-level:
                                                          WARNING</div>
                                                          <div> </div>
                                                          <div>[root@gfs01a
                                                          ~]# rpm -qa |
                                                          grep gluster<br>
gluster-nagios-common-0.1.1-0.el6.noarch<br>
glusterfs-fuse-3.6.6-1.el6.x86_64<br>
glusterfs-debuginfo-3.6.6-1.el6.x86_64<br>
glusterfs-libs-3.6.6-1.el6.x86_64<br>
glusterfs-geo-replication-3.6.6-1.el6.x86_64<br>
glusterfs-api-3.6.6-1.el6.x86_64<br>
glusterfs-devel-3.6.6-1.el6.x86_64<br>
glusterfs-api-devel-3.6.6-1.el6.x86_64<br>
glusterfs-3.6.6-1.el6.x86_64<br>
glusterfs-cli-3.6.6-1.el6.x86_64<br>
glusterfs-rdma-3.6.6-1.el6.x86_64<br>
samba-vfs-glusterfs-4.1.11-2.el6.x86_64<br>
glusterfs-server-3.6.6-1.el6.x86_64<br>
glusterfs-extra-xlators-3.6.6-1.el6.x86_64<br>
                                                          </div>
                                                          <div> </div>
                                                          <div>
                                                          <div
                                                          style="FONT-SIZE:
                                                          12pt;
                                                          FONT-FAMILY:
                                                          Times
 New
                                                          Roman"><span><span>
                                                          <div> </div>
                                                          </span></span></div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          <fieldset></fieldset>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          <pre>_______________________________________________
Gluster-devel mailing list
<a href="mailto:Gluster-devel@gluster.org" moz-do-not-send="true">Gluster-devel@gluster.org</a>
<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" moz-do-not-send="true">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>
                                                          </blockquote>
                                                          <br>
                                                          </div>
                                                          <br>
_______________________________________________<br>
                                                          Gluster-users
                                                          mailing list<br>
                                                          <a
                                                          href="mailto:Gluster-users@gluster.org"
moz-do-not-send="true">Gluster-users@gluster.org</a><br>
                                                          <a
                                                          href="http://www.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer" moz-do-not-send="true">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          </div>
                                                          </blockquote>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          </div>
                                                          </blockquote>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          </div>
                                                          </blockquote>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                        </div>
                                                      </blockquote>
                                                      <br>
                                                    </div>
                                                  </div>
                                                </div>
                                              </blockquote>
                                            </div>
                                            <br>
                                          </div>
                                        </blockquote>
                                        <br>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </div>
                              <br>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                    <br>
                  </div>
                </blockquote>
                <br>
              </blockquote>
            </div>
          </blockquote>
          <br>
        </blockquote>
      </div>
    </blockquote>
    <br>
  </body>
</html>