<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <br>

    <div class="moz-cite-prefix">On 01/25/2016 09:11 AM, David Robinson

      wrote:<br>

    </div>

    <blockquote

      cite="mid:em809bc756-d377-440b-8d2a-62cbd5ef7a55@dfrobins-vaio"

      type="cite">

      <style id="eMClientCss">blockquote.cite { margin-left: 5px; margin-right: 0px; padding-left: 10px; padding-right:0px; border-left: 1px solid #cccccc }

blockquote.cite2 {margin-left: 5px; margin-right: 0px; padding-left: 10px; padding-right:0px; border-left: 1px solid #cccccc; margin-top: 3px; padding-top: 0px; }

.plain pre, .plain tt { font-family: monospace; font-size: 100%; font-weight: normal; font-style: normal;}

a img { border: 0px; }body {font-family: Times New Roman;font-size: 12pt;}

.plain pre, .plain tt {font-family: Times New Roman;font-size: 12pt;}

</style>

      <style></style>

      <div>A lot more than 128-clients.  Well over 1000.  And, I believe

        we might have found the problem and it looks like you were

        headed in the right direction as it appears to be a problem with

        one of the clients FUSE mounts.  </div>

      <div> </div>

      <div>When we couldn't resolve the issue, I started moving all of

        my users off of the gluster storage system as it was no longer

        responsive.  After moving all of them off, I tried to kill all

        of the clients that had homegfs mounted by doing a 'killall

        glusterfs' on all of the machines connected to gluster.  There

        was one machine where even after killing all of the glusterfs

        processes and checking to make sure no glusterfs was running,

        'mount' still showed the FUSE mount.  After I did a 'umount -lf

        /homegfs' it finally went away.  </div>

      <div> </div>

      <div>After I killed the client mounts and restarted all of them,

        we haven't had any more issues with out of control loads on the

        storage systems.  We had seen this before with a runaway FUSE

        mount, but we found the problem by looking at the load on all of

        the clients.  The one problem node had an extremely high load

        that was out of the norm.  When we went to that machine and did

        a reset of the FUSE mount, it cleared the problem.  In this

        case, there was no indication of which of the clients was

        causing the issue and the only way to figure it out was to take

        the storage system out of production use.  </div>

      <div> </div>

      <div>My understanding is that the FUSE clients writes to both

        pairs in the replica at the same time.  Does it make sense that

        it stopped writing to one of the pairs, and therefore,

        everything that was written by that FUSE mount had to be

        healed?  In a normal scenario, there shouldn't be any (or very

        few) heals, right? </div>

      <div> </div>

      <div>Is there any better way to trace out this issue in the

        future?  Is there a way to figure out which mount is not

        connected properly or which mount is causing all of the heals? 

        Or, alternatively, is there a way to force all of the clients to

        remount without going to all of the clients and killing the

        glusterfs process?  This obviously becomes difficult in a

        scenario when you have thousands of clients connected.</div>

    </blockquote>

    <br>

    You are the only responsive user I know with this kind of setup

    where there are a lot of mounts connected to the Volume. Most of the

    corner case bugs in the client table expand logic (Which is hit if

    we have more than 128 clients) are found by you from Oct-2014 when I

    started assisting you :-). Your inputs are valuable here. Please

    provide the log file of the bad mount to see what it was doing. I

    will think a bit more about the enhancements we need to do to make

    debugging easier in your case.<br>

    <br>

    Pranith<br>

    <blockquote

      cite="mid:em809bc756-d377-440b-8d2a-62cbd5ef7a55@dfrobins-vaio"

      type="cite">

      <div> </div>

      <div>David</div>

      <div> </div>

      <div> </div>

      <div> </div>

      <div> </div>

      <div>------ Original Message ------</div>

      <div>From: "Pranith Kumar Karampuri" &lt;<a moz-do-not-send="true"

          href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>&gt;</div>

      <div>To: "Glomski, Patrick" &lt;<a moz-do-not-send="true"

          href="mailto:patrick.glomski@corvidtec.com">patrick.glomski@corvidtec.com</a>&gt;</div>

      <div>Cc: "David Robinson" &lt;<a moz-do-not-send="true"

          href="mailto:drobinson@corvidtec.com">drobinson@corvidtec.com</a>&gt;;

        <a class="moz-txt-link-rfc2396E" href="mailto:gluster-users@gluster.org">"gluster-users@gluster.org"</a> &lt;<a moz-do-not-send="true"

          href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>&gt;;

        "Gluster Devel" &lt;<a moz-do-not-send="true"

          href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>&gt;</div>

      <div>Sent: 1/24/2016 10:22:04 PM</div>

      <div>Subject: Re: [Gluster-users] [Gluster-devel] heal hanging</div>

      <div> </div>

      <div id="xb6f9a08511b04930b21397a929bbbabf" style="COLOR: #000000">

        <blockquote class="cite2" cite="56A594DC.6030804@redhat.com"

          type="cite">You guys use more than 128 clients don't you? We

          recently found a memory corruption in client-table which is

          used in locking. I wonder if it has some role to play here.<br>

          <a moz-do-not-send="true" class="moz-txt-link-freetext"

            href="http://review.gluster.org/13241">http://review.gluster.org/13241</a>

          is the fix. Could you see if you are seeing this issue even

          after this fix?<br>

          <br>

          Pranith<br>

          <div class="moz-cite-prefix">On 01/22/2016 08:36 AM, Glomski,

            Patrick wrote:<br>

          </div>

          <blockquote class="cite"

cite="mid:CALkMjdCZRYOvhNGOrCFS9v6Y-vOhX2do0HA-N=CpMf1OBo4+dg@mail.gmail.com"

            type="cite">

            <div dir="ltr">

              <div>Pranith, attached are stack traces collected every

                second for 20 seconds from the high-%cpu glusterfsd

                process.<br>

                <br>

              </div>

              Patrick<br>

            </div>

            <div class="gmail_extra"><br>

              <div class="gmail_quote">On Thu, Jan 21, 2016 at 9:46 PM,

                Glomski, Patrick <span dir="ltr">&lt;<a

                    href="mailto:patrick.glomski@corvidtec.com"

                    moz-do-not-send="true">patrick.glomski@corvidtec.com</a>&gt;</span>

                wrote:<br>

                <blockquote class="gmail_quote" style="PADDING-LEFT:

                  1ex; BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px

                  0.8ex">

                  <div dir="ltr">

                    <div>Last entry for get_real_filename on any of the

                      bricks was when we turned off the samba gfapi vfs

                      plugin earlier today:<br>

                      <br>

                      /var/log/glusterfs/bricks/data-brick01a-homegfs.log:[2016-01-21

                      15:13:00.008239] E

                      [server-rpc-fops.c:768:server_getxattr_cbk]

                      0-homegfs-server: 105: GETXATTR /wks_backup

                      (40e582d6-b0c7-4099-ba88-9168a3c32ca6)

                      (glusterfs.get_real_filename:desktop.ini) ==&gt;

                      (Permission denied)<br>

                      <br>

                    </div>

                    We'll get back to you with those traces when %cpu

                    spikes again. As with most sporadic problems, as

                    soon as you want something out of it, the issue

                    becomes harder to reproduce.<br>

                    <div>

                      <div><br>

                      </div>

                    </div>

                  </div>

                  <div class="HOEnZb">

                    <div class="h5">

                      <div class="gmail_extra"><br>

                        <div class="gmail_quote">On Thu, Jan 21, 2016 at

                          9:21 PM, Pranith Kumar Karampuri <span

                            dir="ltr">&lt;<a

                              href="mailto:pkarampu@redhat.com"

                              moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>

                          wrote:<br>

                          <blockquote class="gmail_quote"

                            style="PADDING-LEFT: 1ex; BORDER-LEFT: #ccc

                            1px solid; MARGIN: 0px 0px 0px 0.8ex">

                            <div text="#000000" bgcolor="#FFFFFF"><span><br>

                                <br>

                                <div>On 01/22/2016 07:25 AM, Glomski,

                                  Patrick wrote:<br>

                                </div>

                              </span><span>

                                <blockquote class="cite" type="cite">

                                  <div dir="ltr">Unfortunately, all

                                    samba mounts to the gluster volume

                                    through the gfapi vfs plugin have

                                    been disabled for the last 6 hours

                                    or so and frequency of %cpu spikes

                                    is increased. We had switched to

                                    sharing a fuse mount through samba,

                                    but I just disabled that as well.

                                    There are no samba shares of this

                                    volume now. The spikes now happen

                                    every thirty minutes or so. We've

                                    resorted to just rebooting the

                                    machine with high load for the

                                    present.<br>

                                  </div>

                                </blockquote>

                                <br>

                              </span>Could you see if the logs of

                              following type are not at all coming?<br>

                              [2016-01-21 15:13:00.005736] E

                              [server-rpc-fops.c:768:server_getxattr_cbk]

                              0-homegfs-server: 110: GETXATTR

                              /wks_backup

                              (40e582d6-b0c7-4099-ba88-9168a3c<br>

                              32ca6)

                              (glusterfs.get_real_filename:desktop.ini)

                              ==&gt; (Permission denied)<br>

                              <br>

                              These are operations that failed.

                              Operations that succeed are the ones that

                              will scan the directory. But I don't have

                              a way to find them other than using

                              tcpdumps.<br>

                              <br>

                              At the moment I have 2 theories:<br>

                              1) these get_real_filename calls<br>

                              2) [2016-01-21 16:10:38.017828] E

                              [server-helpers.c:46:gid_resolve]

                              0-gid-cache: getpwuid_r(494) failed<br>

                              "<br>

                              <p class="MsoNormal"><span

                                  style="FONT-SIZE: 11pt; FONT-FAMILY:

                                  &quot;Calibri&quot;,&quot;sans-serif&quot;;

                                  COLOR: #1f497d">Yessir they are. 

                                  Normally, sssd would look to the local

                                  cache file in /var/lib/sss/db/ first,

                                  to get any group or userid

                                  information, then go out to the domain

                                  controller.  I put the options that we

                                  are using on our GFS volumes below… 

                                  Thanks for your help.</span></p>

                              <p class="MsoNormal"><span

                                  style="FONT-SIZE: 11pt; FONT-FAMILY:

                                  &quot;Calibri&quot;,&quot;sans-serif&quot;;

                                  COLOR: #1f497d"> </span></p>

                              <p class="MsoNormal"><span

                                  style="FONT-SIZE: 11pt; FONT-FAMILY:

                                  &quot;Calibri&quot;,&quot;sans-serif&quot;;

                                  COLOR: #1f497d">We had been running

                                  sssd with sssd_nss and sssd_be

                                  sub-processes on these systems for a

                                  long time, under the GFS 3.5.2 code,

                                  and not run into the problem that

                                  David described with the high cpu

                                  usage on sssd_nss.</span></p>

                              <b><span>"<br>

                                </span></b>That was Tom Young's email

                              1.5 years back when we debugged it. But

                              the process which was consuming lot of cpu

                              is sssd_nss. So I am not sure if it is

                              same issue. Let us debug to see '1)'

                              doesn't happen. The gstack traces I asked

                              for should also help.

                              <div>

                                <div><br>

                                  <br>

                                  Pranith<br>

                                  <blockquote class="cite" type="cite">

                                    <div class="gmail_extra"><br>

                                      <div class="gmail_quote">On Thu,

                                        Jan 21, 2016 at 8:49 PM, Pranith

                                        Kumar Karampuri <span dir="ltr">&lt;<a

href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>

                                        wrote:<br>

                                        <blockquote class="gmail_quote"

                                          style="PADDING-LEFT: 1ex;

                                          BORDER-LEFT: #ccc 1px solid;

                                          MARGIN: 0px 0px 0px 0.8ex">

                                          <div text="#000000"

                                            bgcolor="#FFFFFF"><span><br>

                                              <br>

                                              <div>On 01/22/2016 07:13

                                                AM, Glomski, Patrick

                                                wrote:<br>

                                              </div>

                                              <blockquote class="cite"

                                                type="cite">

                                                <div dir="ltr">We use

                                                  the samba glusterfs

                                                  virtual filesystem

                                                  (the current version

                                                  provided on <a

                                                    href="http://download.gluster.org/"

moz-do-not-send="true">download.gluster.org</a>), but no windows clients

                                                  connecting directly.<br>

                                                </div>

                                              </blockquote>

                                              <br>

                                            </span>Hmm.. Is there a way

                                            to disable using this and

                                            check if the CPU% still

                                            increases? What getxattr of

                                            "glusterfs.get_real_filename

                                            &lt;filanme&gt;" does is to

                                            scan the entire directory

                                            looking for

                                            strcasecmp(&lt;filname&gt;,

                                            &lt;scanned-filename&gt;).

                                            If anything matches then it

                                            will return the

                                            &lt;scanned-filename&gt;.

                                            But the problem is the scan

                                            is costly. So I wonder if

                                            this is the reason for the

                                            CPU spikes.<span><font

                                                color="#888888"><br>

                                                <br>

                                                Pranith</font></span>

                                            <div>

                                              <div><br>

                                                <blockquote class="cite"

                                                  type="cite">

                                                  <div

                                                    class="gmail_extra"><br>

                                                    <div

                                                      class="gmail_quote">On

                                                      Thu, Jan 21, 2016

                                                      at 8:37 PM,

                                                      Pranith Kumar

                                                      Karampuri <span

                                                        dir="ltr">&lt;<a

href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>

                                                      wrote:<br>

                                                      <blockquote

                                                        class="gmail_quote"

                                                        style="PADDING-LEFT:

                                                        1ex;

                                                        BORDER-LEFT:

                                                        #ccc 1px solid;

                                                        MARGIN: 0px 0px

                                                        0px 0.8ex">

                                                        <div

                                                          text="#000000"

bgcolor="#FFFFFF">Do you have any windows clients? I see a lot of

                                                          getxattr calls

                                                          for

                                                          "glusterfs.get_real_filename"

                                                          which lead to

                                                          full readdirs

                                                          of the

                                                          directories on

                                                          the brick.<span><font

color="#888888"><br>

                                                          <br>

                                                          Pranith</font></span><span><br>

                                                          <br>

                                                          <div>On

                                                          01/22/2016

                                                          12:51 AM,

                                                          Glomski,

                                                          Patrick wrote:<br>

                                                          </div>

                                                          </span>

                                                          <div>

                                                          <div>

                                                          <blockquote

                                                          class="cite"

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div>Pranith,

                                                          could this

                                                          kind of

                                                          behavior be

                                                          self-inflicted

                                                          by us deleting

                                                          files directly

                                                          from the

                                                          bricks? We

                                                          have done that

                                                          in the past to

                                                          clean up an

                                                          issues where

                                                          gluster

                                                          wouldn't allow

                                                          us to delete

                                                          from the

                                                          mount.<br>

                                                          <br>

                                                          If so, is it

                                                          feasible to

                                                          clean them up

                                                          by running a

                                                          search on the

                                                          .glusterfs

                                                          directories

                                                          directly and

                                                          removing files

                                                          with a

                                                          reference

                                                          count of 1

                                                          that are

                                                          non-zero size

                                                          (or directly

                                                          checking the

                                                          xattrs to be

                                                          sure that it's

                                                          not a DHT

                                                          link). <br>

                                                          <br>

                                                          find

                                                          /data/brick01a/homegfs/.glusterfs

                                                          -type f -not

                                                          -empty -links

                                                          -2 -exec rm -f

                                                          "{}" \;<br>

                                                          <br>

                                                          </div>

                                                          Is there

                                                          anything I'm

                                                          inherently

                                                          missing with

                                                          that approach

                                                          that will

                                                          further

                                                          corrupt the

                                                          system?<br>

                                                          <div><br>

                                                          </div>

                                                          </div>

                                                          <div

                                                          class="gmail_extra"><br>

                                                          <div

                                                          class="gmail_quote">On

                                                          Thu, Jan 21,

                                                          2016 at 1:02

                                                          PM, Glomski,

                                                          Patrick <span

                                                          dir="ltr">&lt;<a

href="mailto:patrick.glomski@corvidtec.com" moz-do-not-send="true">patrick.glomski@corvidtec.com</a>&gt;</span>

                                                          wrote:<br>

                                                          <blockquote

                                                          class="gmail_quote"

                                                          style="PADDING-LEFT:

                                                          1ex;

                                                          BORDER-LEFT:

                                                          #ccc 1px

                                                          solid; MARGIN:

                                                          0px 0px 0px

                                                          0.8ex">

                                                          <div dir="ltr">

                                                          <div>

                                                          <div>Load

                                                          spiked again:

                                                          ~1200%cpu on

                                                          gfs02a for

                                                          glusterfsd.

                                                          Crawl has been

                                                          running on one

                                                          of the bricks

                                                          on gfs02b for

                                                          25 min or so

                                                          and users

                                                          cannot access

                                                          the volume.<br>

                                                          <br>

                                                          I re-listed

                                                          the xattrop

                                                          directories as

                                                          well as a

                                                          'top' entry

                                                          and heal

                                                          statistics.

                                                          Then I

                                                          restarted the

                                                          gluster

                                                          services on

                                                          gfs02a. <br>

                                                          <br>

                                                          ===================

                                                          top

                                                          ===================<br>

                                                          PID USER     

                                                          PR  NI  VIRT 

                                                          RES  SHR S

                                                          %CPU %MEM   

                                                          TIME+ 

                                                          COMMAND                                                

                                                          <br>

                                                           8969

                                                          root      20  

                                                          0 2815m 204m

                                                          3588 S 1181.0 

                                                          0.6 591:06.93

                                                          glusterfsd        

                                                          <br>

                                                          <br>

                                                          ===================

                                                          xattrop

                                                          ===================<br>

/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>

                                                          xattrop-41f19453-91e4-437c-afa9-3b25614de210 

xattrop-9b815879-2f4d-402b-867c-a6d65087788c<br>

                                                          <br>

/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>

                                                          xattrop-70131855-3cfb-49af-abce-9d23f57fb393 

xattrop-dfb77848-a39d-4417-a725-9beca75d78c6<br>

                                                          <br>

/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>

                                                          e6e47ed9-309b-42a7-8c44-28c29b9a20f8         

xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125<br>

                                                          xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 

xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0<br>

                                                          <br>

/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>

                                                          xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc 

xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413<br>

                                                          <br>

/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531<br>

                                                          <br>

/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-7e20fdb1-5224-4b9a-be06-568708526d70<br>

                                                          <br>

/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>

                                                          8034bc06-92cd-4fa5-8aaf-09039e79d2c8 

c9ce22ed-6d8b-471b-a111-b39e57f0b512<br>

                                                          94fa1d60-45ad-4341-b69c-315936b51e8d 

xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7<br>

                                                          <br>

/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d<br>

                                                          <br>

                                                          <br>

                                                          ===================

                                                          heal stats

                                                          ===================<br>

                                                           <br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:36:45 2016<br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:36:45 2016<br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:36:19 2016<br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:36:19 2016<br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : No. of heal

                                                          failed

                                                          entries   : 1<br>

                                                           <br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:36:48 2016<br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:36:48 2016<br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:36:47 2016<br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:36:47 2016<br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:36:06 2016<br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:36:06 2016<br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:13:40 2016<br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          :                               

                                                          *** Crawl is

                                                          in progress

                                                          ***<br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:36:58 2016<br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:36:58 2016<br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:36:50 2016<br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:36:50 2016<br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                          <br>

                                                          <br>

========================================================================================<br>

                                                          </div>

                                                          I waited a few

                                                          minutes for

                                                          the heals to

                                                          finish and ran

                                                          the heal

                                                          statistics and

                                                          info again.

                                                          one file is in

                                                          split-brain.

                                                          Aside from the

                                                          split-brain,

                                                          the load on

                                                          all systems is

                                                          down now and

                                                          they are

                                                          behaving

                                                          normally.

                                                          glustershd.log

                                                          is attached.

                                                          What is going

                                                          on??? <br>

                                                          <br>

                                                          Thu Jan 21

                                                          12:53:50 EST

                                                          2016<br>

                                                           <br>

                                                          ===================

                                                          homegfs

                                                          ===================<br>

                                                           <br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:53:02 2016<br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:53:02 2016<br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b0-gfsib01a]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:53:38 2016<br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:53:38 2016<br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b1-gfsib01b]

                                                          : No. of heal

                                                          failed

                                                          entries   : 1<br>

                                                           <br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:53:04 2016<br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:53:04 2016<br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b2-gfsib01a]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:53:04 2016<br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:53:04 2016<br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b3-gfsib01b]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:53:33 2016<br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:53:33 2016<br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b4-gfsib02a]

                                                          : No. of heal

                                                          failed

                                                          entries   : 1<br>

                                                           <br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:53:14 2016<br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:53:15 2016<br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b5-gfsib02b]

                                                          : No. of heal

                                                          failed

                                                          entries   : 3<br>

                                                           <br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:53:04 2016<br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:53:04 2016<br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b6-gfsib02a]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : Starting

                                                          time of

                                                          crawl       :

                                                          Thu Jan 21

                                                          12:53:09 2016<br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : Ending time

                                                          of

                                                          crawl        

                                                          : Thu Jan 21

                                                          12:53:09 2016<br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : Type of

                                                          crawl: INDEX<br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : No. of

                                                          entries

                                                          healed       

                                                          : 0<br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : No. of

                                                          entries in

                                                          split-brain: 0<br>

                                                          homegfs

                                                          [b7-gfsib02b]

                                                          : No. of heal

                                                          failed

                                                          entries   : 0<br>

                                                           <br>

                                                          *** gluster

                                                          bug in

                                                          'gluster

                                                          volume heal

                                                          homegfs

                                                          statistics'  

                                                          ***<br>

                                                          *** Use

                                                          'gluster

                                                          volume heal

                                                          homegfs info'

                                                          until bug is

                                                          fixed ***<span><br>

                                                           <br>

                                                          Brick

                                                          gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>

                                                          </span>/users/bangell/.gconfd

                                                          - Is in

                                                          split-brain<br>

                                                          <br>

                                                          Number of

                                                          entries: 1<br>

                                                          <br>

                                                          Brick

                                                          gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>

                                                          /users/bangell/.gconfd

                                                          - Is in

                                                          split-brain<br>

                                                          <br>

                                                          /users/bangell/.gconfd/saved_state

                                                          <br>

                                                          Number of

                                                          entries: 2<span><br>

                                                          <br>

                                                          Brick

                                                          gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          </span></div>

                                                          <div><br>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          <div>

                                                          <div>

                                                          <div

                                                          class="gmail_extra"><br>

                                                          <div

                                                          class="gmail_quote">On

                                                          Thu, Jan 21,

                                                          2016 at 11:10

                                                          AM, Pranith

                                                          Kumar

                                                          Karampuri <span

                                                          dir="ltr">&lt;<a

href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>

                                                          wrote:<br>

                                                          <blockquote

                                                          class="gmail_quote"

                                                          style="PADDING-LEFT:

                                                          1ex;

                                                          BORDER-LEFT:

                                                          #ccc 1px

                                                          solid; MARGIN:

                                                          0px 0px 0px

                                                          0.8ex">

                                                          <div

                                                          text="#000000"

bgcolor="#FFFFFF"><span><br>

                                                          <br>

                                                          <div>On

                                                          01/21/2016

                                                          09:26 PM,

                                                          Glomski,

                                                          Patrick wrote:<br>

                                                          </div>

                                                          <blockquote

                                                          class="cite"

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div>I should

                                                          mention that

                                                          the problem is

                                                          not currently

                                                          occurring and

                                                          there are no

                                                          heals (output

                                                          appended). By

                                                          restarting the

                                                          gluster

                                                          services, we

                                                          can stop the

                                                          crawl, which

                                                          lowers the

                                                          load for a

                                                          while.

                                                          Subsequent

                                                          crawls seem to

                                                          finish

                                                          properly. For

                                                          what it's

                                                          worth,

                                                          files/folders

                                                          that show up

                                                          in the 'volume

                                                          info' output

                                                          during a hung

                                                          crawl don't

                                                          seem to be

                                                          anything out

                                                          of the

                                                          ordinary. <br>

                                                          <br>

                                                          Over the past

                                                          four days, the

                                                          typical time

                                                          before the

                                                          problem recurs

                                                          after

                                                          suppressing it

                                                          in this manner

                                                          is an hour.

                                                          Last night

                                                          when we

                                                          reached out to

                                                          you was the

                                                          last time it

                                                          happened and

                                                          the load has

                                                          been low since

                                                          (a relief). 

                                                          David believes

                                                          that

                                                          recursively

                                                          listing the

                                                          files (ls -alR

                                                          or similar)

                                                          from a client

                                                          mount can

                                                          force the

                                                          issue to

                                                          happen, but

                                                          obviously I'd

                                                          rather not

                                                          unless we have

                                                          some precise

                                                          thing we're

                                                          looking for.

                                                          Let me know if

                                                          you'd like me

                                                          to attempt to

                                                          drive the

                                                          system

                                                          unstable like

                                                          that and what

                                                          I should look

                                                          for. As it's a

                                                          production

                                                          system, I'd

                                                          rather not

                                                          leave it in

                                                          this state for

                                                          long.<br>

                                                          </div>

                                                          </div>

                                                          </blockquote>

                                                          <br>

                                                          </span>Will it

                                                          be possible to

                                                          send

                                                          glustershd,

                                                          mount logs of

                                                          the past 4

                                                          days? I would

                                                          like to see if

                                                          this is

                                                          because of

                                                          directory

                                                          self-heal

                                                          going wild

                                                          (Ravi is

                                                          working on

                                                          throttling

                                                          feature for

                                                          3.8, which

                                                          will allow to

                                                          put breaks on

                                                          self-heal

                                                          traffic)<span><font

color="#888888"><br>

                                                          <br>

                                                          Pranith</font></span>

                                                          <div>

                                                          <div><br>

                                                          <blockquote

                                                          class="cite"

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div><br>

                                                          </div>

                                                          <div>[root@gfs01a

                                                          xattrop]#

                                                          gluster volume

                                                          heal homegfs

                                                          info<br>

                                                          Brick

                                                          gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          <br>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          <div

                                                          class="gmail_extra"><br>

                                                          <div

                                                          class="gmail_quote">On

                                                          Thu, Jan 21,

                                                          2016 at 10:40

                                                          AM, Pranith

                                                          Kumar

                                                          Karampuri <span

                                                          dir="ltr">&lt;<a

href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>

                                                          wrote:<br>

                                                          <blockquote

                                                          class="gmail_quote"

                                                          style="PADDING-LEFT:

                                                          1ex;

                                                          BORDER-LEFT:

                                                          #ccc 1px

                                                          solid; MARGIN:

                                                          0px 0px 0px

                                                          0.8ex">

                                                          <div

                                                          text="#000000"

bgcolor="#FFFFFF"><span><br>

                                                          <br>

                                                          <div>On

                                                          01/21/2016

                                                          08:25 PM,

                                                          Glomski,

                                                          Patrick wrote:<br>

                                                          </div>

                                                          <blockquote

                                                          class="cite"

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div>Hello,

                                                          Pranith. The

                                                          typical

                                                          behavior is

                                                          that the %cpu

                                                          on a

                                                          glusterfsd

                                                          process jumps

                                                          to number of

                                                          processor

                                                          cores

                                                          available

                                                          (800% or

                                                          1200%,

                                                          depending on

                                                          the pair of

                                                          nodes

                                                          involved) and

                                                          the load

                                                          average on the

                                                          machine goes

                                                          very high

                                                          (~20). The

                                                          volume's heal

                                                          statistics

                                                          output shows

                                                          that it is

                                                          crawling one

                                                          of the bricks

                                                          and trying to

                                                          heal, but this

                                                          crawl hangs

                                                          and never

                                                          seems to

                                                          finish.<br>

                                                          </div>

                                                          </div>

                                                          </blockquote>

                                                          <blockquote

                                                          class="cite"

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div><br>

                                                          </div>

                                                          The number of

                                                          files in the

                                                          xattrop

                                                          directory

                                                          varies over

                                                          time, so I ran

                                                          a wc -l as you

                                                          requested

                                                          periodically

                                                          for some time

                                                          and then

                                                          started

                                                          including a

                                                          datestamped

                                                          list of the

                                                          files that

                                                          were in the

                                                          xattrops

                                                          directory on

                                                          each brick to

                                                          see which were

                                                          persistent.

                                                          All bricks had

                                                          files in the

                                                          xattrop

                                                          folder, so all

                                                          results are

                                                          attached.<br>

                                                          </div>

                                                          </blockquote>

                                                          </span>Thanks

                                                          this info is

                                                          helpful. I

                                                          don't see a

                                                          lot of files.

                                                          Could you give

                                                          output of

                                                          "gluster

                                                          volume heal

                                                          &lt;volname&gt;

                                                          info"? Is

                                                          there any

                                                          directory in

                                                          there which is

                                                          LARGE?<span><font

color="#888888"><br>

                                                          <br>

                                                          Pranith</font></span>

                                                          <div>

                                                          <div><br>

                                                          <blockquote

                                                          class="cite"

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div><br>

                                                          </div>

                                                          <div>Please

                                                          let me know if

                                                          there is

                                                          anything else

                                                          I can provide.<br>

                                                          </div>

                                                          <div><br>

                                                          </div>

                                                          <div>Patrick<br>

                                                          </div>

                                                          <div><br>

                                                          </div>

                                                          </div>

                                                          <div

                                                          class="gmail_extra"><br>

                                                          <div

                                                          class="gmail_quote">On

                                                          Thu, Jan 21,

                                                          2016 at 12:01

                                                          AM, Pranith

                                                          Kumar

                                                          Karampuri <span

                                                          dir="ltr">&lt;<a

href="mailto:pkarampu@redhat.com" moz-do-not-send="true">pkarampu@redhat.com</a>&gt;</span>

                                                          wrote:<br>

                                                          <blockquote

                                                          class="gmail_quote"

                                                          style="PADDING-LEFT:

                                                          1ex;

                                                          BORDER-LEFT:

                                                          #ccc 1px

                                                          solid; MARGIN:

                                                          0px 0px 0px

                                                          0.8ex">

                                                          <div

                                                          text="#000000"

bgcolor="#FFFFFF">hey,<br>

                                                                 Which

                                                          process is

                                                          consuming so

                                                          much cpu? I

                                                          went through

                                                          the logs you

                                                          gave me. I see

                                                          that the

                                                          following

                                                          files are in

                                                          gfid mismatch

                                                          state:<br>

                                                          <br>

&lt;066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup&gt;,<br>

&lt;1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak&gt;,<br>

&lt;ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg&gt;,<br>

                                                          <br>

                                                          Could you give

                                                          me the output

                                                          of "ls

                                                          &lt;brick-path&gt;/indices/xattrop

                                                          | wc -l"

                                                          output on all

                                                          the bricks

                                                          which are

                                                          acting this

                                                          way? This will

                                                          tell us the

                                                          number of

                                                          pending

                                                          self-heals on

                                                          the system.<br>

                                                          <br>

                                                          Pranith

                                                          <div>

                                                          <div><br>

                                                          <br>

                                                          <div>On

                                                          01/20/2016

                                                          09:26 PM,

                                                          David Robinson

                                                          wrote:<br>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          <blockquote

                                                          class="cite"

                                                          type="cite">

                                                          <div>

                                                          <div>

                                                          <div>resending

                                                          with parsed

                                                          logs... </div>

                                                          <div> </div>

                                                          <div>

                                                          <blockquote

                                                          class="cite"

                                                          cite="http://em5ee26b0e-002a-4230-bdec-3020b98cff3c@dfrobins-vaio"

                                                          type="cite">

                                                          <div> </div>

                                                          <div> </div>

                                                          <div>

                                                          <blockquote

                                                          class="cite"

                                                          cite="http://eme3b2cb80-8be2-4fa5-9d08-4710955e237c@dfrobins-vaio"

                                                          type="cite">

                                                          <div>I am

                                                          having issues

                                                          with 3.6.6

                                                          where the load

                                                          will spike up

                                                          to 800% for

                                                          one of the

                                                          glusterfsd

                                                          processes and

                                                          the users can

                                                          no longer

                                                          access the

                                                          system.  If I

                                                          reboot the

                                                          node, the heal

                                                          will finish

                                                          normally after

                                                          a few minutes

                                                          and the system

                                                          will be

                                                          responsive,

                                                          but a few

                                                          hours later

                                                          the issue will

                                                          start again. 

                                                          It look like

                                                          it is hanging

                                                          in a heal and

                                                          spinning up

                                                          the load on

                                                          one of the

                                                          bricks.  The

                                                          heal gets

                                                          stuck and says

                                                          it is crawling

                                                          and never

                                                          returns. 

                                                          After a few

                                                          minutes of the

                                                          heal saying it

                                                          is crawling,

                                                          the load

                                                          spikes up and

                                                          the mounts

                                                          become

                                                          unresponsive.</div>

                                                          <div> </div>

                                                          <div>Any

                                                          suggestions on

                                                          how to fix

                                                          this?  It has

                                                          us stopped

                                                          cold as the

                                                          user can no

                                                          longer access

                                                          the systems

                                                          when the load

                                                          spikes... Logs

                                                          attached.</div>

                                                          <div> </div>

                                                          <div>System

                                                          setup info is:

                                                          </div>

                                                          <div> </div>

                                                          <div>[root@gfs01a

                                                          ~]# gluster

                                                          volume info

                                                          homegfs<br>

                                                           <br>

                                                          Volume Name:

                                                          homegfs<br>

                                                          Type:

                                                          Distributed-Replicate<br>

                                                          Volume ID:

                                                          1e32672a-f1b7-4b58-ba94-58c085e59071<br>

                                                          Status:

                                                          Started<br>

                                                          Number of

                                                          Bricks: 4 x 2

                                                          = 8<br>

                                                          Transport-type:

                                                          tcp<br>

                                                          Bricks:<br>

                                                          Brick1:

                                                          gfsib01a.corvidtec.com:/data/brick01a/homegfs<br>

                                                          Brick2:

                                                          gfsib01b.corvidtec.com:/data/brick01b/homegfs<br>

                                                          Brick3:

                                                          gfsib01a.corvidtec.com:/data/brick02a/homegfs<br>

                                                          Brick4:

                                                          gfsib01b.corvidtec.com:/data/brick02b/homegfs<br>

                                                          Brick5:

                                                          gfsib02a.corvidtec.com:/data/brick01a/homegfs<br>

                                                          Brick6:

                                                          gfsib02b.corvidtec.com:/data/brick01b/homegfs<br>

                                                          Brick7:

                                                          gfsib02a.corvidtec.com:/data/brick02a/homegfs<br>

                                                          Brick8:

                                                          gfsib02b.corvidtec.com:/data/brick02b/homegfs<br>

                                                          Options

                                                          Reconfigured:<br>

                                                          performance.io-thread-count:

                                                          32<br>

                                                          performance.cache-size:

                                                          128MB<br>

                                                          performance.write-behind-window-size:

                                                          128MB<br>

                                                          server.allow-insecure:

                                                          on<br>

                                                          network.ping-timeout:

                                                          42<br>

                                                          storage.owner-gid:

                                                          100<br>

                                                          geo-replication.indexing:

                                                          off<br>

                                                          geo-replication.ignore-pid-check:

                                                          on<br>

                                                          changelog.changelog:

                                                          off<br>

                                                          changelog.fsync-interval:

                                                          3<br>

                                                          changelog.rollover-time:

                                                          15<br>

                                                          server.manage-gids:

                                                          on<br>

                                                          diagnostics.client-log-level:

                                                          WARNING</div>

                                                          <div> </div>

                                                          <div>[root@gfs01a

                                                          ~]# rpm -qa |

                                                          grep gluster<br>

gluster-nagios-common-0.1.1-0.el6.noarch<br>

glusterfs-fuse-3.6.6-1.el6.x86_64<br>

glusterfs-debuginfo-3.6.6-1.el6.x86_64<br>

glusterfs-libs-3.6.6-1.el6.x86_64<br>

glusterfs-geo-replication-3.6.6-1.el6.x86_64<br>

glusterfs-api-3.6.6-1.el6.x86_64<br>

glusterfs-devel-3.6.6-1.el6.x86_64<br>

glusterfs-api-devel-3.6.6-1.el6.x86_64<br>

glusterfs-3.6.6-1.el6.x86_64<br>

glusterfs-cli-3.6.6-1.el6.x86_64<br>

glusterfs-rdma-3.6.6-1.el6.x86_64<br>

samba-vfs-glusterfs-4.1.11-2.el6.x86_64<br>

glusterfs-server-3.6.6-1.el6.x86_64<br>

glusterfs-extra-xlators-3.6.6-1.el6.x86_64<br>

                                                          </div>

                                                          <div> </div>

                                                          <div>

                                                          <div

                                                          style="FONT-SIZE:

                                                          12pt;

                                                          FONT-FAMILY:

                                                          Times

 New

                                                          Roman"><span><span>

                                                          <div> </div>

                                                          </span></span></div>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          <fieldset></fieldset>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          <pre>_______________________________________________

Gluster-devel mailing list

<a href="mailto:Gluster-devel@gluster.org" moz-do-not-send="true">Gluster-devel@gluster.org</a>

<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" moz-do-not-send="true">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>

                                                          </blockquote>

                                                          <br>

                                                          </div>

                                                          <br>

_______________________________________________<br>

                                                          Gluster-users

                                                          mailing list<br>

                                                          <a

                                                          href="mailto:Gluster-users@gluster.org"

moz-do-not-send="true">Gluster-users@gluster.org</a><br>

                                                          <a

                                                          href="http://www.gluster.org/mailman/listinfo/gluster-users"

rel="noreferrer" moz-do-not-send="true">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          </div>

                                                          </blockquote>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          </div>

                                                          </blockquote>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          </div>

                                                          </blockquote>

                                                          <br>

                                                          </div>

                                                          </div>

                                                        </div>

                                                      </blockquote>

                                                    </div>

                                                    <br>

                                                  </div>

                                                </blockquote>

                                                <br>

                                              </div>

                                            </div>

                                          </div>

                                        </blockquote>

                                      </div>

                                      <br>

                                    </div>

                                  </blockquote>

                                  <br>

                                </div>

                              </div>

                            </div>

                          </blockquote>

                        </div>

                        <br>

                      </div>

                    </div>

                  </div>

                </blockquote>

              </div>

              <br>

            </div>

          </blockquote>

          <br>

        </blockquote>

      </div>

    </blockquote>

    <br>

  </body>

</html>