<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <br>
    <div class="moz-cite-prefix">On 01/22/2016 07:25 AM, Glomski,
      Patrick wrote:<br>
    </div>
    <blockquote
cite="mid:CALkMjdDRCsoOgmdefzBkdDez1Uqt9Z4_8qiEfCXLW-oasNz5gQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">Unfortunately, all samba mounts to the gluster
        volume through the gfapi vfs plugin have been disabled for the
        last 6 hours or so and frequency of %cpu spikes is increased. We
        had switched to sharing a fuse mount through samba, but I just
        disabled that as well. There are no samba shares of this volume
        now. The spikes now happen every thirty minutes or so. We've
        resorted to just rebooting the machine with high load for the
        present.<br>
      </div>
    </blockquote>
    <br>
    Next time this CPU spike happens, could you collect samples of
    gstack &lt;pid-of-brick&gt; every second for 10-20 seconds? That
    helps in finding the heavily hit function calls.<br>
    <br>
    Something like "for i in {1..20}; do gstack &lt;pid-of-brick&gt;
    &gt; sample-$i.txt; done"<br>
    <br>
    Pranith<br>
    <blockquote
cite="mid:CALkMjdDRCsoOgmdefzBkdDez1Uqt9Z4_8qiEfCXLW-oasNz5gQ@mail.gmail.com"
      type="cite">
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Thu, Jan 21, 2016 at 8:49 PM,
          Pranith Kumar Karampuri <span dir="ltr">&lt;<a
              moz-do-not-send="true" href="mailto:pkarampu@redhat.com"
              target="_blank">pkarampu@redhat.com</a>&gt;</span> wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"><span class=""> <br>
                <br>
                <div>On 01/22/2016 07:13 AM, Glomski, Patrick wrote:<br>
                </div>
                <blockquote type="cite">
                  <div dir="ltr">We use the samba glusterfs virtual
                    filesystem (the current version provided on <a
                      moz-do-not-send="true"
                      href="http://download.gluster.org" target="_blank">download.gluster.org</a>),

                    but no windows clients connecting directly.<br>
                  </div>
                </blockquote>
                <br>
              </span> Hmm.. Is there a way to disable using this and
              check if the CPU% still increases? What getxattr of
              "glusterfs.get_real_filename &lt;filanme&gt;" does is to
              scan the entire directory looking for
              strcasecmp(&lt;filname&gt;, &lt;scanned-filename&gt;). If
              anything matches then it will return the
              &lt;scanned-filename&gt;. But the problem is the scan is
              costly. So I wonder if this is the reason for the CPU
              spikes.<span class="HOEnZb"><font color="#888888"><br>
                  <br>
                  Pranith</font></span>
              <div>
                <div class="h5"><br>
                  <blockquote type="cite">
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">On Thu, Jan 21, 2016 at
                        8:37 PM, Pranith Kumar Karampuri <span
                          dir="ltr">&lt;<a moz-do-not-send="true"
                            href="mailto:pkarampu@redhat.com"
                            target="_blank">pkarampu@redhat.com</a>&gt;</span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <div bgcolor="#FFFFFF" text="#000000"> Do you
                            have any windows clients? I see a lot of
                            getxattr calls for
                            "glusterfs.get_real_filename" which lead to
                            full readdirs of the directories on the
                            brick.<span><font color="#888888"><br>
                                <br>
                                Pranith</font></span><span><br>
                              <br>
                              <div>On 01/22/2016 12:51 AM, Glomski,
                                Patrick wrote:<br>
                              </div>
                            </span>
                            <div>
                              <div>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div>Pranith, could this kind of
                                      behavior be self-inflicted by us
                                      deleting files directly from the
                                      bricks? We have done that in the
                                      past to clean up an issues where
                                      gluster wouldn't allow us to
                                      delete from the mount.<br>
                                      <br>
                                      If so, is it feasible to clean
                                      them up by running a search on the
                                      .glusterfs directories directly
                                      and removing files with a
                                      reference count of 1 that are
                                      non-zero size (or directly
                                      checking the xattrs to be sure
                                      that it's not a DHT link). <br>
                                      <br>
                                      find
                                      /data/brick01a/homegfs/.glusterfs
                                      -type f -not -empty -links -2
                                      -exec rm -f "{}" \;<br>
                                      <br>
                                    </div>
                                    Is there anything I'm inherently
                                    missing with that approach that will
                                    further corrupt the system?<br>
                                    <div><br>
                                    </div>
                                  </div>
                                  <div class="gmail_extra"><br>
                                    <div class="gmail_quote">On Thu, Jan
                                      21, 2016 at 1:02 PM, Glomski,
                                      Patrick <span dir="ltr">&lt;<a
                                          moz-do-not-send="true"
                                          href="mailto:patrick.glomski@corvidtec.com"
                                          target="_blank">patrick.glomski@corvidtec.com</a>&gt;</span>
                                      wrote:<br>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex">
                                        <div dir="ltr">
                                          <div>
                                            <div>Load spiked again:
                                              ~1200%cpu on gfs02a for
                                              glusterfsd. Crawl has been
                                              running on one of the
                                              bricks on gfs02b for 25
                                              min or so and users cannot
                                              access the volume.<br>
                                              <br>
                                              I re-listed the xattrop
                                              directories as well as a
                                              'top' entry and heal
                                              statistics. Then I
                                              restarted the gluster
                                              services on gfs02a. <br>
                                              <br>
                                              =================== top
                                              ===================<br>
                                              PID USER      PR  NI 
                                              VIRT  RES  SHR S %CPU
                                              %MEM    TIME+ 
                                              COMMAND                                                
                                              <br>
                                               8969 root      20   0
                                              2815m 204m 3588 S 1181.0 
                                              0.6 591:06.93
                                              glusterfsd         <br>
                                              <br>
                                              ===================
                                              xattrop
                                              ===================<br>
/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-41f19453-91e4-437c-afa9-3b25614de210 
xattrop-9b815879-2f4d-402b-867c-a6d65087788c<br>
                                              <br>
/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-70131855-3cfb-49af-abce-9d23f57fb393 
xattrop-dfb77848-a39d-4417-a725-9beca75d78c6<br>
                                              <br>
/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>
e6e47ed9-309b-42a7-8c44-28c29b9a20f8         
xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125<br>
xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 
xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0<br>
                                              <br>
/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc 
xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413<br>
                                              <br>
/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531<br>
                                              <br>
/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-7e20fdb1-5224-4b9a-be06-568708526d70<br>
                                              <br>
/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>
                                              8034bc06-92cd-4fa5-8aaf-09039e79d2c8 

c9ce22ed-6d8b-471b-a111-b39e57f0b512<br>
                                              94fa1d60-45ad-4341-b69c-315936b51e8d 

xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7<br>
                                              <br>
/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d<br>
                                              <br>
                                              <br>
                                              =================== heal
                                              stats ===================<br>
                                               <br>
                                              homegfs [b0-gfsib01a] :
                                              Starting time of
                                              crawl       : Thu Jan 21
                                              12:36:45 2016<br>
                                              homegfs [b0-gfsib01a] :
                                              Ending time of
                                              crawl         : Thu Jan 21
                                              12:36:45 2016<br>
                                              homegfs [b0-gfsib01a] :
                                              Type of crawl: INDEX<br>
                                              homegfs [b0-gfsib01a] :
                                              No. of entries
                                              healed        : 0<br>
                                              homegfs [b0-gfsib01a] :
                                              No. of entries in
                                              split-brain: 0<br>
                                              homegfs [b0-gfsib01a] :
                                              No. of heal failed
                                              entries   : 0<br>
                                               <br>
                                              homegfs [b1-gfsib01b] :
                                              Starting time of
                                              crawl       : Thu Jan 21
                                              12:36:19 2016<br>
                                              homegfs [b1-gfsib01b] :
                                              Ending time of
                                              crawl         : Thu Jan 21
                                              12:36:19 2016<br>
                                              homegfs [b1-gfsib01b] :
                                              Type of crawl: INDEX<br>
                                              homegfs [b1-gfsib01b] :
                                              No. of entries
                                              healed        : 0<br>
                                              homegfs [b1-gfsib01b] :
                                              No. of entries in
                                              split-brain: 0<br>
                                              homegfs [b1-gfsib01b] :
                                              No. of heal failed
                                              entries   : 1<br>
                                               <br>
                                              homegfs [b2-gfsib01a] :
                                              Starting time of
                                              crawl       : Thu Jan 21
                                              12:36:48 2016<br>
                                              homegfs [b2-gfsib01a] :
                                              Ending time of
                                              crawl         : Thu Jan 21
                                              12:36:48 2016<br>
                                              homegfs [b2-gfsib01a] :
                                              Type of crawl: INDEX<br>
                                              homegfs [b2-gfsib01a] :
                                              No. of entries
                                              healed        : 0<br>
                                              homegfs [b2-gfsib01a] :
                                              No. of entries in
                                              split-brain: 0<br>
                                              homegfs [b2-gfsib01a] :
                                              No. of heal failed
                                              entries   : 0<br>
                                               <br>
                                              homegfs [b3-gfsib01b] :
                                              Starting time of
                                              crawl       : Thu Jan 21
                                              12:36:47 2016<br>
                                              homegfs [b3-gfsib01b] :
                                              Ending time of
                                              crawl         : Thu Jan 21
                                              12:36:47 2016<br>
                                              homegfs [b3-gfsib01b] :
                                              Type of crawl: INDEX<br>
                                              homegfs [b3-gfsib01b] :
                                              No. of entries
                                              healed        : 0<br>
                                              homegfs [b3-gfsib01b] :
                                              No. of entries in
                                              split-brain: 0<br>
                                              homegfs [b3-gfsib01b] :
                                              No. of heal failed
                                              entries   : 0<br>
                                               <br>
                                              homegfs [b4-gfsib02a] :
                                              Starting time of
                                              crawl       : Thu Jan 21
                                              12:36:06 2016<br>
                                              homegfs [b4-gfsib02a] :
                                              Ending time of
                                              crawl         : Thu Jan 21
                                              12:36:06 2016<br>
                                              homegfs [b4-gfsib02a] :
                                              Type of crawl: INDEX<br>
                                              homegfs [b4-gfsib02a] :
                                              No. of entries
                                              healed        : 0<br>
                                              homegfs [b4-gfsib02a] :
                                              No. of entries in
                                              split-brain: 0<br>
                                              homegfs [b4-gfsib02a] :
                                              No. of heal failed
                                              entries   : 0<br>
                                               <br>
                                              homegfs [b5-gfsib02b] :
                                              Starting time of
                                              crawl       : Thu Jan 21
                                              12:13:40 2016<br>
                                              homegfs [b5-gfsib02b]
                                              :                               
                                              *** Crawl is in progress
                                              ***<br>
                                              homegfs [b5-gfsib02b] :
                                              Type of crawl: INDEX<br>
                                              homegfs [b5-gfsib02b] :
                                              No. of entries
                                              healed        : 0<br>
                                              homegfs [b5-gfsib02b] :
                                              No. of entries in
                                              split-brain: 0<br>
                                              homegfs [b5-gfsib02b] :
                                              No. of heal failed
                                              entries   : 0<br>
                                               <br>
                                              homegfs [b6-gfsib02a] :
                                              Starting time of
                                              crawl       : Thu Jan 21
                                              12:36:58 2016<br>
                                              homegfs [b6-gfsib02a] :
                                              Ending time of
                                              crawl         : Thu Jan 21
                                              12:36:58 2016<br>
                                              homegfs [b6-gfsib02a] :
                                              Type of crawl: INDEX<br>
                                              homegfs [b6-gfsib02a] :
                                              No. of entries
                                              healed        : 0<br>
                                              homegfs [b6-gfsib02a] :
                                              No. of entries in
                                              split-brain: 0<br>
                                              homegfs [b6-gfsib02a] :
                                              No. of heal failed
                                              entries   : 0<br>
                                               <br>
                                              homegfs [b7-gfsib02b] :
                                              Starting time of
                                              crawl       : Thu Jan 21
                                              12:36:50 2016<br>
                                              homegfs [b7-gfsib02b] :
                                              Ending time of
                                              crawl         : Thu Jan 21
                                              12:36:50 2016<br>
                                              homegfs [b7-gfsib02b] :
                                              Type of crawl: INDEX<br>
                                              homegfs [b7-gfsib02b] :
                                              No. of entries
                                              healed        : 0<br>
                                              homegfs [b7-gfsib02b] :
                                              No. of entries in
                                              split-brain: 0<br>
                                              homegfs [b7-gfsib02b] :
                                              No. of heal failed
                                              entries   : 0<br>
                                              <br>
                                              <br>
========================================================================================<br>
                                            </div>
                                            I waited a few minutes for
                                            the heals to finish and ran
                                            the heal statistics and info
                                            again. one file is in
                                            split-brain. Aside from the
                                            split-brain, the load on all
                                            systems is down now and they
                                            are behaving normally.
                                            glustershd.log is attached.
                                            What is going on??? <br>
                                            <br>
                                            Thu Jan 21 12:53:50 EST 2016<br>
                                             <br>
                                            =================== homegfs
                                            ===================<br>
                                             <br>
                                            homegfs [b0-gfsib01a] :
                                            Starting time of crawl      
                                            : Thu Jan 21 12:53:02 2016<br>
                                            homegfs [b0-gfsib01a] :
                                            Ending time of crawl        
                                            : Thu Jan 21 12:53:02 2016<br>
                                            homegfs [b0-gfsib01a] : Type
                                            of crawl: INDEX<br>
                                            homegfs [b0-gfsib01a] : No.
                                            of entries healed        : 0<br>
                                            homegfs [b0-gfsib01a] : No.
                                            of entries in split-brain: 0<br>
                                            homegfs [b0-gfsib01a] : No.
                                            of heal failed entries   : 0<br>
                                             <br>
                                            homegfs [b1-gfsib01b] :
                                            Starting time of crawl      
                                            : Thu Jan 21 12:53:38 2016<br>
                                            homegfs [b1-gfsib01b] :
                                            Ending time of crawl        
                                            : Thu Jan 21 12:53:38 2016<br>
                                            homegfs [b1-gfsib01b] : Type
                                            of crawl: INDEX<br>
                                            homegfs [b1-gfsib01b] : No.
                                            of entries healed        : 0<br>
                                            homegfs [b1-gfsib01b] : No.
                                            of entries in split-brain: 0<br>
                                            homegfs [b1-gfsib01b] : No.
                                            of heal failed entries   : 1<br>
                                             <br>
                                            homegfs [b2-gfsib01a] :
                                            Starting time of crawl      
                                            : Thu Jan 21 12:53:04 2016<br>
                                            homegfs [b2-gfsib01a] :
                                            Ending time of crawl        
                                            : Thu Jan 21 12:53:04 2016<br>
                                            homegfs [b2-gfsib01a] : Type
                                            of crawl: INDEX<br>
                                            homegfs [b2-gfsib01a] : No.
                                            of entries healed        : 0<br>
                                            homegfs [b2-gfsib01a] : No.
                                            of entries in split-brain: 0<br>
                                            homegfs [b2-gfsib01a] : No.
                                            of heal failed entries   : 0<br>
                                             <br>
                                            homegfs [b3-gfsib01b] :
                                            Starting time of crawl      
                                            : Thu Jan 21 12:53:04 2016<br>
                                            homegfs [b3-gfsib01b] :
                                            Ending time of crawl        
                                            : Thu Jan 21 12:53:04 2016<br>
                                            homegfs [b3-gfsib01b] : Type
                                            of crawl: INDEX<br>
                                            homegfs [b3-gfsib01b] : No.
                                            of entries healed        : 0<br>
                                            homegfs [b3-gfsib01b] : No.
                                            of entries in split-brain: 0<br>
                                            homegfs [b3-gfsib01b] : No.
                                            of heal failed entries   : 0<br>
                                             <br>
                                            homegfs [b4-gfsib02a] :
                                            Starting time of crawl      
                                            : Thu Jan 21 12:53:33 2016<br>
                                            homegfs [b4-gfsib02a] :
                                            Ending time of crawl        
                                            : Thu Jan 21 12:53:33 2016<br>
                                            homegfs [b4-gfsib02a] : Type
                                            of crawl: INDEX<br>
                                            homegfs [b4-gfsib02a] : No.
                                            of entries healed        : 0<br>
                                            homegfs [b4-gfsib02a] : No.
                                            of entries in split-brain: 0<br>
                                            homegfs [b4-gfsib02a] : No.
                                            of heal failed entries   : 1<br>
                                             <br>
                                            homegfs [b5-gfsib02b] :
                                            Starting time of crawl      
                                            : Thu Jan 21 12:53:14 2016<br>
                                            homegfs [b5-gfsib02b] :
                                            Ending time of crawl        
                                            : Thu Jan 21 12:53:15 2016<br>
                                            homegfs [b5-gfsib02b] : Type
                                            of crawl: INDEX<br>
                                            homegfs [b5-gfsib02b] : No.
                                            of entries healed        : 0<br>
                                            homegfs [b5-gfsib02b] : No.
                                            of entries in split-brain: 0<br>
                                            homegfs [b5-gfsib02b] : No.
                                            of heal failed entries   : 3<br>
                                             <br>
                                            homegfs [b6-gfsib02a] :
                                            Starting time of crawl      
                                            : Thu Jan 21 12:53:04 2016<br>
                                            homegfs [b6-gfsib02a] :
                                            Ending time of crawl        
                                            : Thu Jan 21 12:53:04 2016<br>
                                            homegfs [b6-gfsib02a] : Type
                                            of crawl: INDEX<br>
                                            homegfs [b6-gfsib02a] : No.
                                            of entries healed        : 0<br>
                                            homegfs [b6-gfsib02a] : No.
                                            of entries in split-brain: 0<br>
                                            homegfs [b6-gfsib02a] : No.
                                            of heal failed entries   : 0<br>
                                             <br>
                                            homegfs [b7-gfsib02b] :
                                            Starting time of crawl      
                                            : Thu Jan 21 12:53:09 2016<br>
                                            homegfs [b7-gfsib02b] :
                                            Ending time of crawl        
                                            : Thu Jan 21 12:53:09 2016<br>
                                            homegfs [b7-gfsib02b] : Type
                                            of crawl: INDEX<br>
                                            homegfs [b7-gfsib02b] : No.
                                            of entries healed        : 0<br>
                                            homegfs [b7-gfsib02b] : No.
                                            of entries in split-brain: 0<br>
                                            homegfs [b7-gfsib02b] : No.
                                            of heal failed entries   : 0<br>
                                             <br>
                                            *** gluster bug in 'gluster
                                            volume heal homegfs
                                            statistics'   ***<br>
                                            *** Use 'gluster volume heal
                                            homegfs info' until bug is
                                            fixed ***<span><br>
                                               <br>
                                              Brick
                                              gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>
                                              Number of entries: 0<br>
                                              <br>
                                              Brick
                                              gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>
                                              Number of entries: 0<br>
                                              <br>
                                              Brick
                                              gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>
                                              Number of entries: 0<br>
                                              <br>
                                              Brick
                                              gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>
                                              Number of entries: 0<br>
                                              <br>
                                              Brick
                                              gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>
                                            </span>/users/bangell/.gconfd
                                            - Is in split-brain<br>
                                            <br>
                                            Number of entries: 1<br>
                                            <br>
                                            Brick
                                            gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>
                                            /users/bangell/.gconfd - Is
                                            in split-brain<br>
                                            <br>
                                            /users/bangell/.gconfd/saved_state
                                            <br>
                                            Number of entries: 2<span><br>
                                              <br>
                                              Brick
                                              gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>
                                              Number of entries: 0<br>
                                              <br>
                                              Brick
                                              gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>
                                              Number of entries: 0<br>
                                              <br>
                                            </span></div>
                                          <div><br>
                                            <br>
                                          </div>
                                        </div>
                                        <div>
                                          <div>
                                            <div class="gmail_extra"><br>
                                              <div class="gmail_quote">On
                                                Thu, Jan 21, 2016 at
                                                11:10 AM, Pranith Kumar
                                                Karampuri <span
                                                  dir="ltr">&lt;<a
                                                    moz-do-not-send="true"
href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>
                                                wrote:<br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                  <div bgcolor="#FFFFFF"
                                                    text="#000000"><span>
                                                      <br>
                                                      <br>
                                                      <div>On 01/21/2016
                                                        09:26 PM,
                                                        Glomski, Patrick
                                                        wrote:<br>
                                                      </div>
                                                      <blockquote
                                                        type="cite">
                                                        <div dir="ltr">
                                                          <div>I should
                                                          mention that
                                                          the problem is
                                                          not currently
                                                          occurring and
                                                          there are no
                                                          heals (output
                                                          appended). By
                                                          restarting the
                                                          gluster
                                                          services, we
                                                          can stop the
                                                          crawl, which
                                                          lowers the
                                                          load for a
                                                          while.
                                                          Subsequent
                                                          crawls seem to
                                                          finish
                                                          properly. For
                                                          what it's
                                                          worth,
                                                          files/folders
                                                          that show up
                                                          in the 'volume
                                                          info' output
                                                          during a hung
                                                          crawl don't
                                                          seem to be
                                                          anything out
                                                          of the
                                                          ordinary. <br>
                                                          <br>
                                                          Over the past
                                                          four days, the
                                                          typical time
                                                          before the
                                                          problem recurs
                                                          after
                                                          suppressing it
                                                          in this manner
                                                          is an hour.
                                                          Last night
                                                          when we
                                                          reached out to
                                                          you was the
                                                          last time it
                                                          happened and
                                                          the load has
                                                          been low since
                                                          (a relief). 
                                                          David believes
                                                          that
                                                          recursively
                                                          listing the
                                                          files (ls -alR
                                                          or similar)
                                                          from a client
                                                          mount can
                                                          force the
                                                          issue to
                                                          happen, but
                                                          obviously I'd
                                                          rather not
                                                          unless we have
                                                          some precise
                                                          thing we're
                                                          looking for.
                                                          Let me know if
                                                          you'd like me
                                                          to attempt to
                                                          drive the
                                                          system
                                                          unstable like
                                                          that and what
                                                          I should look
                                                          for. As it's a
                                                          production
                                                          system, I'd
                                                          rather not
                                                          leave it in
                                                          this state for
                                                          long.<br>
                                                          </div>
                                                        </div>
                                                      </blockquote>
                                                      <br>
                                                    </span> Will it be
                                                    possible to send
                                                    glustershd, mount
                                                    logs of the past 4
                                                    days? I would like
                                                    to see if this is
                                                    because of directory
                                                    self-heal going wild
                                                    (Ravi is working on
                                                    throttling feature
                                                    for 3.8, which will
                                                    allow to put breaks
                                                    on self-heal
                                                    traffic)<span><font
                                                        color="#888888"><br>
                                                        <br>
                                                        Pranith</font></span>
                                                    <div>
                                                      <div><br>
                                                        <blockquote
                                                          type="cite">
                                                          <div dir="ltr">
                                                          <div><br>
                                                          </div>
                                                          <div>[root@gfs01a

                                                          xattrop]#
                                                          gluster volume
                                                          heal homegfs
                                                          info<br>
                                                          Brick
                                                          gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          Brick
                                                          gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>
                                                          Number of
                                                          entries: 0<br>
                                                          <br>
                                                          <br>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          <div
                                                          class="gmail_extra"><br>
                                                          <div
                                                          class="gmail_quote">On

                                                          Thu, Jan 21,
                                                          2016 at 10:40
                                                          AM, Pranith
                                                          Kumar
                                                          Karampuri <span
                                                          dir="ltr">&lt;<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>
                                                          wrote:<br>
                                                          <blockquote
                                                          class="gmail_quote"
                                                          style="margin:0
                                                          0 0
                                                          .8ex;border-left:1px
                                                          #ccc
                                                          solid;padding-left:1ex">
                                                          <div
                                                          bgcolor="#FFFFFF"
                                                          text="#000000"><span>
                                                          <br>
                                                          <br>
                                                          <div>On
                                                          01/21/2016
                                                          08:25 PM,
                                                          Glomski,
                                                          Patrick wrote:<br>
                                                          </div>
                                                          <blockquote
                                                          type="cite">
                                                          <div dir="ltr">
                                                          <div>Hello,
                                                          Pranith. The
                                                          typical
                                                          behavior is
                                                          that the %cpu
                                                          on a
                                                          glusterfsd
                                                          process jumps
                                                          to number of
                                                          processor
                                                          cores
                                                          available
                                                          (800% or
                                                          1200%,
                                                          depending on
                                                          the pair of
                                                          nodes
                                                          involved) and
                                                          the load
                                                          average on the
                                                          machine goes
                                                          very high
                                                          (~20). The
                                                          volume's heal
                                                          statistics
                                                          output shows
                                                          that it is
                                                          crawling one
                                                          of the bricks
                                                          and trying to
                                                          heal, but this
                                                          crawl hangs
                                                          and never
                                                          seems to
                                                          finish.<br>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          <blockquote
                                                          type="cite">
                                                          <div dir="ltr">
                                                          <div><br>
                                                          </div>
                                                          The number of
                                                          files in the
                                                          xattrop
                                                          directory
                                                          varies over
                                                          time, so I ran
                                                          a wc -l as you
                                                          requested
                                                          periodically
                                                          for some time
                                                          and then
                                                          started
                                                          including a
                                                          datestamped
                                                          list of the
                                                          files that
                                                          were in the
                                                          xattrops
                                                          directory on
                                                          each brick to
                                                          see which were
                                                          persistent.
                                                          All bricks had
                                                          files in the
                                                          xattrop
                                                          folder, so all
                                                          results are
                                                          attached.<br>
                                                          </div>
                                                          </blockquote>
                                                          </span> Thanks
                                                          this info is
                                                          helpful. I
                                                          don't see a
                                                          lot of files.
                                                          Could you give
                                                          output of
                                                          "gluster
                                                          volume heal
                                                          &lt;volname&gt;
                                                          info"? Is
                                                          there any
                                                          directory in
                                                          there which is
                                                          LARGE?<span><font
color="#888888"><br>
                                                          <br>
                                                          Pranith</font></span>
                                                          <div>
                                                          <div><br>
                                                          <blockquote
                                                          type="cite">
                                                          <div dir="ltr">
                                                          <div><br>
                                                          </div>
                                                          <div>Please
                                                          let me know if
                                                          there is
                                                          anything else
                                                          I can provide.<br>
                                                          </div>
                                                          <div><br>
                                                          </div>
                                                          <div>Patrick<br>
                                                          </div>
                                                          <div><br>
                                                          </div>
                                                          </div>
                                                          <div
                                                          class="gmail_extra"><br>
                                                          <div
                                                          class="gmail_quote">On


                                                          Thu, Jan 21,
                                                          2016 at 12:01
                                                          AM, Pranith
                                                          Kumar
                                                          Karampuri <span
                                                          dir="ltr">&lt;<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>
                                                          wrote:<br>
                                                          <blockquote
                                                          class="gmail_quote"
                                                          style="margin:0
                                                          0 0
                                                          .8ex;border-left:1px
                                                          #ccc
                                                          solid;padding-left:1ex">
                                                          <div
                                                          bgcolor="#FFFFFF"
                                                          text="#000000">
                                                          hey,<br>
                                                                 Which
                                                          process is
                                                          consuming so
                                                          much cpu? I
                                                          went through
                                                          the logs you
                                                          gave me. I see
                                                          that the
                                                          following
                                                          files are in
                                                          gfid mismatch
                                                          state:<br>
                                                          <br>
&lt;066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup&gt;,<br>
&lt;1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak&gt;,<br>
&lt;ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg&gt;,<br>
                                                          <br>
                                                          Could you give
                                                          me the output
                                                          of "ls
                                                          &lt;brick-path&gt;/indices/xattrop
                                                          | wc -l"
                                                          output on all
                                                          the bricks
                                                          which are
                                                          acting this
                                                          way? This will
                                                          tell us the
                                                          number of
                                                          pending
                                                          self-heals on
                                                          the system.<br>
                                                          <br>
                                                          Pranith
                                                          <div>
                                                          <div><br>
                                                          <br>
                                                          <div>On
                                                          01/20/2016
                                                          09:26 PM,
                                                          David Robinson
                                                          wrote:<br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          <blockquote
                                                          type="cite">
                                                          <div>
                                                          <div>
                                                          <div>resending
                                                          with parsed
                                                          logs... </div>
                                                          <div> </div>
                                                          <div>
                                                          <blockquote
                                                          cite="http://em5ee26b0e-002a-4230-bdec-3020b98cff3c@dfrobins-vaio"
                                                          type="cite">
                                                          <div> </div>
                                                          <div> </div>
                                                          <div>
                                                          <blockquote
                                                          cite="http://eme3b2cb80-8be2-4fa5-9d08-4710955e237c@dfrobins-vaio"
                                                          type="cite">
                                                          <div>I am
                                                          having issues
                                                          with 3.6.6
                                                          where the load
                                                          will spike up
                                                          to 800% for
                                                          one of the
                                                          glusterfsd
                                                          processes and
                                                          the users can
                                                          no longer
                                                          access the
                                                          system.  If I
                                                          reboot the
                                                          node, the heal
                                                          will finish
                                                          normally after
                                                          a few minutes
                                                          and the system
                                                          will be
                                                          responsive,
                                                          but a few
                                                          hours later
                                                          the issue will
                                                          start again. 
                                                          It look like
                                                          it is hanging
                                                          in a heal and
                                                          spinning up
                                                          the load on
                                                          one of the
                                                          bricks.  The
                                                          heal gets
                                                          stuck and says
                                                          it is crawling
                                                          and never
                                                          returns. 
                                                          After a few
                                                          minutes of the
                                                          heal saying it
                                                          is crawling,
                                                          the load
                                                          spikes up and
                                                          the mounts
                                                          become
                                                          unresponsive.</div>
                                                          <div> </div>
                                                          <div>Any
                                                          suggestions on
                                                          how to fix
                                                          this?  It has
                                                          us stopped
                                                          cold as the
                                                          user can no
                                                          longer access
                                                          the systems
                                                          when the load
                                                          spikes... Logs
                                                          attached.</div>
                                                          <div> </div>
                                                          <div>System
                                                          setup info is:
                                                          </div>
                                                          <div> </div>
                                                          <div>[root@gfs01a


                                                          ~]# gluster
                                                          volume info
                                                          homegfs<br>
                                                           <br>
                                                          Volume Name:
                                                          homegfs<br>
                                                          Type:
                                                          Distributed-Replicate<br>
                                                          Volume ID:
                                                          1e32672a-f1b7-4b58-ba94-58c085e59071<br>
                                                          Status:
                                                          Started<br>
                                                          Number of
                                                          Bricks: 4 x 2
                                                          = 8<br>
                                                          Transport-type:


                                                          tcp<br>
                                                          Bricks:<br>
                                                          Brick1:
                                                          gfsib01a.corvidtec.com:/data/brick01a/homegfs<br>
                                                          Brick2:
                                                          gfsib01b.corvidtec.com:/data/brick01b/homegfs<br>
                                                          Brick3:
                                                          gfsib01a.corvidtec.com:/data/brick02a/homegfs<br>
                                                          Brick4:
                                                          gfsib01b.corvidtec.com:/data/brick02b/homegfs<br>
                                                          Brick5:
                                                          gfsib02a.corvidtec.com:/data/brick01a/homegfs<br>
                                                          Brick6:
                                                          gfsib02b.corvidtec.com:/data/brick01b/homegfs<br>
                                                          Brick7:
                                                          gfsib02a.corvidtec.com:/data/brick02a/homegfs<br>
                                                          Brick8:
                                                          gfsib02b.corvidtec.com:/data/brick02b/homegfs<br>
                                                          Options
                                                          Reconfigured:<br>
                                                          performance.io-thread-count:



                                                          32<br>
                                                          performance.cache-size:



                                                          128MB<br>
                                                          performance.write-behind-window-size:




                                                          128MB<br>
                                                          server.allow-insecure:


                                                          on<br>
                                                          network.ping-timeout:


                                                          42<br>
                                                          storage.owner-gid:


                                                          100<br>
                                                          geo-replication.indexing:



                                                          off<br>
                                                          geo-replication.ignore-pid-check:



                                                          on<br>
                                                          changelog.changelog:


                                                          off<br>
                                                          changelog.fsync-interval:


                                                          3<br>
                                                          changelog.rollover-time:


                                                          15<br>
                                                          server.manage-gids:


                                                          on<br>
                                                          diagnostics.client-log-level:



                                                          WARNING</div>
                                                          <div> </div>
                                                          <div>[root@gfs01a


                                                          ~]# rpm -qa |
                                                          grep gluster<br>
gluster-nagios-common-0.1.1-0.el6.noarch<br>
glusterfs-fuse-3.6.6-1.el6.x86_64<br>
glusterfs-debuginfo-3.6.6-1.el6.x86_64<br>
glusterfs-libs-3.6.6-1.el6.x86_64<br>
glusterfs-geo-replication-3.6.6-1.el6.x86_64<br>
glusterfs-api-3.6.6-1.el6.x86_64<br>
glusterfs-devel-3.6.6-1.el6.x86_64<br>
glusterfs-api-devel-3.6.6-1.el6.x86_64<br>
glusterfs-3.6.6-1.el6.x86_64<br>
glusterfs-cli-3.6.6-1.el6.x86_64<br>
glusterfs-rdma-3.6.6-1.el6.x86_64<br>
samba-vfs-glusterfs-4.1.11-2.el6.x86_64<br>
glusterfs-server-3.6.6-1.el6.x86_64<br>
glusterfs-extra-xlators-3.6.6-1.el6.x86_64<br>
                                                          </div>
                                                          <div> </div>
                                                          <div>
                                                          <div
                                                          style="FONT-SIZE:12pt;FONT-FAMILY:Times
                                                          New Roman"><span><span>
                                                          <div> </div>
                                                          </span></span></div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          <fieldset></fieldset>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          <pre>_______________________________________________
Gluster-devel mailing list
<a moz-do-not-send="true" href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a>
<a moz-do-not-send="true" href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>
                                                          </blockquote>
                                                          <br>
                                                          </div>
                                                          <br>
_______________________________________________<br>
                                                          Gluster-users
                                                          mailing list<br>
                                                          <a
                                                          moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
                                                          <a
                                                          moz-do-not-send="true"
href="http://www.gluster.org/mailman/listinfo/gluster-users"
                                                          rel="noreferrer"
target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          </div>
                                                          </blockquote>
                                                          <br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          </div>
                                                        </blockquote>
                                                        <br>
                                                      </div>
                                                    </div>
                                                  </div>
                                                </blockquote>
                                              </div>
                                              <br>
                                            </div>
                                          </div>
                                        </div>
                                      </blockquote>
                                    </div>
                                    <br>
                                  </div>
                                </blockquote>
                                <br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>