<div dir="ltr">We use the samba glusterfs virtual filesystem (the current version provided on <a href="http://download.gluster.org">download.gluster.org</a>), but no windows clients connecting directly.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    Do you have any windows clients? I see a lot of getxattr calls for
    &quot;glusterfs.get_real_filename&quot; which lead to full readdirs of the
    directories on the brick.<span class="HOEnZb"><font color="#888888"><br>
    <br>
    Pranith</font></span><span class=""><br>
    <br>
    <div>On 01/22/2016 12:51 AM, Glomski,
      Patrick wrote:<br>
    </div>
    </span><div><div class="h5"><blockquote type="cite">
      <div dir="ltr">
        <div>Pranith, could this kind of behavior be self-inflicted by
          us deleting files directly from the bricks? We have done that
          in the past to clean up an issues where gluster wouldn&#39;t allow
          us to delete from the mount.<br>
          <br>
          If so, is it feasible to clean them up by running a search on
          the .glusterfs directories directly and removing files with a
          reference count of 1 that are non-zero size (or directly
          checking the xattrs to be sure that it&#39;s not a DHT link). <br>
          <br>
          find /data/brick01a/homegfs/.glusterfs -type f -not -empty
          -links -2 -exec rm -f &quot;{}&quot; \;<br>
          <br>
        </div>
        Is there anything I&#39;m inherently missing with that approach that
        will further corrupt the system?<br>
        <div><br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Thu, Jan 21, 2016 at 1:02 PM,
          Glomski, Patrick <span dir="ltr">&lt;<a href="mailto:patrick.glomski@corvidtec.com" target="_blank">patrick.glomski@corvidtec.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">
              <div>
                <div>Load spiked again: ~1200%cpu on gfs02a for
                  glusterfsd. Crawl has been running on one of the
                  bricks on gfs02b for 25 min or so and users cannot
                  access the volume.<br>
                  <br>
                  I re-listed the xattrop directories as well as a &#39;top&#39;
                  entry and heal statistics. Then I restarted the
                  gluster services on gfs02a. <br>
                  <br>
                  =================== top ===================<br>
                  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM   
                  TIME+ 
                  COMMAND                                                
                  <br>
                   8969 root      20   0 2815m 204m 3588 S 1181.0  0.6
                  591:06.93 glusterfsd         <br>
                  <br>
                  =================== xattrop ===================<br>
                  /data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>
                  xattrop-41f19453-91e4-437c-afa9-3b25614de210 
                  xattrop-9b815879-2f4d-402b-867c-a6d65087788c<br>
                  <br>
                  /data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>
                  xattrop-70131855-3cfb-49af-abce-9d23f57fb393 
                  xattrop-dfb77848-a39d-4417-a725-9beca75d78c6<br>
                  <br>
                  /data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>
                  e6e47ed9-309b-42a7-8c44-28c29b9a20f8         
                  xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125<br>
                  xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 
                  xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0<br>
                  <br>
                  /data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>
                  xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc 
                  xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413<br>
                  <br>
                  /data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>
                  xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531<br>
                  <br>
                  /data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>
                  xattrop-7e20fdb1-5224-4b9a-be06-568708526d70<br>
                  <br>
                  /data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>
                  8034bc06-92cd-4fa5-8aaf-09039e79d2c8 
                  c9ce22ed-6d8b-471b-a111-b39e57f0b512<br>
                  94fa1d60-45ad-4341-b69c-315936b51e8d 
                  xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7<br>
                  <br>
                  /data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>
                  xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d<br>
                  <br>
                  <br>
                  =================== heal stats ===================<br>
                   <br>
                  homegfs [b0-gfsib01a] : Starting time of crawl       :
                  Thu Jan 21 12:36:45 2016<br>
                  homegfs [b0-gfsib01a] : Ending time of crawl         :
                  Thu Jan 21 12:36:45 2016<br>
                  homegfs [b0-gfsib01a] : Type of crawl: INDEX<br>
                  homegfs [b0-gfsib01a] : No. of entries healed        :
                  0<br>
                  homegfs [b0-gfsib01a] : No. of entries in split-brain:
                  0<br>
                  homegfs [b0-gfsib01a] : No. of heal failed entries   :
                  0<br>
                   <br>
                  homegfs [b1-gfsib01b] : Starting time of crawl       :
                  Thu Jan 21 12:36:19 2016<br>
                  homegfs [b1-gfsib01b] : Ending time of crawl         :
                  Thu Jan 21 12:36:19 2016<br>
                  homegfs [b1-gfsib01b] : Type of crawl: INDEX<br>
                  homegfs [b1-gfsib01b] : No. of entries healed        :
                  0<br>
                  homegfs [b1-gfsib01b] : No. of entries in split-brain:
                  0<br>
                  homegfs [b1-gfsib01b] : No. of heal failed entries   :
                  1<br>
                   <br>
                  homegfs [b2-gfsib01a] : Starting time of crawl       :
                  Thu Jan 21 12:36:48 2016<br>
                  homegfs [b2-gfsib01a] : Ending time of crawl         :
                  Thu Jan 21 12:36:48 2016<br>
                  homegfs [b2-gfsib01a] : Type of crawl: INDEX<br>
                  homegfs [b2-gfsib01a] : No. of entries healed        :
                  0<br>
                  homegfs [b2-gfsib01a] : No. of entries in split-brain:
                  0<br>
                  homegfs [b2-gfsib01a] : No. of heal failed entries   :
                  0<br>
                   <br>
                  homegfs [b3-gfsib01b] : Starting time of crawl       :
                  Thu Jan 21 12:36:47 2016<br>
                  homegfs [b3-gfsib01b] : Ending time of crawl         :
                  Thu Jan 21 12:36:47 2016<br>
                  homegfs [b3-gfsib01b] : Type of crawl: INDEX<br>
                  homegfs [b3-gfsib01b] : No. of entries healed        :
                  0<br>
                  homegfs [b3-gfsib01b] : No. of entries in split-brain:
                  0<br>
                  homegfs [b3-gfsib01b] : No. of heal failed entries   :
                  0<br>
                   <br>
                  homegfs [b4-gfsib02a] : Starting time of crawl       :
                  Thu Jan 21 12:36:06 2016<br>
                  homegfs [b4-gfsib02a] : Ending time of crawl         :
                  Thu Jan 21 12:36:06 2016<br>
                  homegfs [b4-gfsib02a] : Type of crawl: INDEX<br>
                  homegfs [b4-gfsib02a] : No. of entries healed        :
                  0<br>
                  homegfs [b4-gfsib02a] : No. of entries in split-brain:
                  0<br>
                  homegfs [b4-gfsib02a] : No. of heal failed entries   :
                  0<br>
                   <br>
                  homegfs [b5-gfsib02b] : Starting time of crawl       :
                  Thu Jan 21 12:13:40 2016<br>
                  homegfs [b5-gfsib02b] :                               
                  *** Crawl is in progress ***<br>
                  homegfs [b5-gfsib02b] : Type of crawl: INDEX<br>
                  homegfs [b5-gfsib02b] : No. of entries healed        :
                  0<br>
                  homegfs [b5-gfsib02b] : No. of entries in split-brain:
                  0<br>
                  homegfs [b5-gfsib02b] : No. of heal failed entries   :
                  0<br>
                   <br>
                  homegfs [b6-gfsib02a] : Starting time of crawl       :
                  Thu Jan 21 12:36:58 2016<br>
                  homegfs [b6-gfsib02a] : Ending time of crawl         :
                  Thu Jan 21 12:36:58 2016<br>
                  homegfs [b6-gfsib02a] : Type of crawl: INDEX<br>
                  homegfs [b6-gfsib02a] : No. of entries healed        :
                  0<br>
                  homegfs [b6-gfsib02a] : No. of entries in split-brain:
                  0<br>
                  homegfs [b6-gfsib02a] : No. of heal failed entries   :
                  0<br>
                   <br>
                  homegfs [b7-gfsib02b] : Starting time of crawl       :
                  Thu Jan 21 12:36:50 2016<br>
                  homegfs [b7-gfsib02b] : Ending time of crawl         :
                  Thu Jan 21 12:36:50 2016<br>
                  homegfs [b7-gfsib02b] : Type of crawl: INDEX<br>
                  homegfs [b7-gfsib02b] : No. of entries healed        :
                  0<br>
                  homegfs [b7-gfsib02b] : No. of entries in split-brain:
                  0<br>
                  homegfs [b7-gfsib02b] : No. of heal failed entries   :
                  0<br>
                  <br>
                  <br>
========================================================================================<br>
                </div>
                I waited a few minutes for the heals to finish and ran
                the heal statistics and info again. one file is in
                split-brain. Aside from the split-brain, the load on all
                systems is down now and they are behaving normally.
                glustershd.log is attached. What is going on??? <br>
                <br>
                Thu Jan 21 12:53:50 EST 2016<br>
                 <br>
                =================== homegfs ===================<br>
                 <br>
                homegfs [b0-gfsib01a] : Starting time of crawl       :
                Thu Jan 21 12:53:02 2016<br>
                homegfs [b0-gfsib01a] : Ending time of crawl         :
                Thu Jan 21 12:53:02 2016<br>
                homegfs [b0-gfsib01a] : Type of crawl: INDEX<br>
                homegfs [b0-gfsib01a] : No. of entries healed        : 0<br>
                homegfs [b0-gfsib01a] : No. of entries in split-brain: 0<br>
                homegfs [b0-gfsib01a] : No. of heal failed entries   : 0<br>
                 <br>
                homegfs [b1-gfsib01b] : Starting time of crawl       :
                Thu Jan 21 12:53:38 2016<br>
                homegfs [b1-gfsib01b] : Ending time of crawl         :
                Thu Jan 21 12:53:38 2016<br>
                homegfs [b1-gfsib01b] : Type of crawl: INDEX<br>
                homegfs [b1-gfsib01b] : No. of entries healed        : 0<br>
                homegfs [b1-gfsib01b] : No. of entries in split-brain: 0<br>
                homegfs [b1-gfsib01b] : No. of heal failed entries   : 1<br>
                 <br>
                homegfs [b2-gfsib01a] : Starting time of crawl       :
                Thu Jan 21 12:53:04 2016<br>
                homegfs [b2-gfsib01a] : Ending time of crawl         :
                Thu Jan 21 12:53:04 2016<br>
                homegfs [b2-gfsib01a] : Type of crawl: INDEX<br>
                homegfs [b2-gfsib01a] : No. of entries healed        : 0<br>
                homegfs [b2-gfsib01a] : No. of entries in split-brain: 0<br>
                homegfs [b2-gfsib01a] : No. of heal failed entries   : 0<br>
                 <br>
                homegfs [b3-gfsib01b] : Starting time of crawl       :
                Thu Jan 21 12:53:04 2016<br>
                homegfs [b3-gfsib01b] : Ending time of crawl         :
                Thu Jan 21 12:53:04 2016<br>
                homegfs [b3-gfsib01b] : Type of crawl: INDEX<br>
                homegfs [b3-gfsib01b] : No. of entries healed        : 0<br>
                homegfs [b3-gfsib01b] : No. of entries in split-brain: 0<br>
                homegfs [b3-gfsib01b] : No. of heal failed entries   : 0<br>
                 <br>
                homegfs [b4-gfsib02a] : Starting time of crawl       :
                Thu Jan 21 12:53:33 2016<br>
                homegfs [b4-gfsib02a] : Ending time of crawl         :
                Thu Jan 21 12:53:33 2016<br>
                homegfs [b4-gfsib02a] : Type of crawl: INDEX<br>
                homegfs [b4-gfsib02a] : No. of entries healed        : 0<br>
                homegfs [b4-gfsib02a] : No. of entries in split-brain: 0<br>
                homegfs [b4-gfsib02a] : No. of heal failed entries   : 1<br>
                 <br>
                homegfs [b5-gfsib02b] : Starting time of crawl       :
                Thu Jan 21 12:53:14 2016<br>
                homegfs [b5-gfsib02b] : Ending time of crawl         :
                Thu Jan 21 12:53:15 2016<br>
                homegfs [b5-gfsib02b] : Type of crawl: INDEX<br>
                homegfs [b5-gfsib02b] : No. of entries healed        : 0<br>
                homegfs [b5-gfsib02b] : No. of entries in split-brain: 0<br>
                homegfs [b5-gfsib02b] : No. of heal failed entries   : 3<br>
                 <br>
                homegfs [b6-gfsib02a] : Starting time of crawl       :
                Thu Jan 21 12:53:04 2016<br>
                homegfs [b6-gfsib02a] : Ending time of crawl         :
                Thu Jan 21 12:53:04 2016<br>
                homegfs [b6-gfsib02a] : Type of crawl: INDEX<br>
                homegfs [b6-gfsib02a] : No. of entries healed        : 0<br>
                homegfs [b6-gfsib02a] : No. of entries in split-brain: 0<br>
                homegfs [b6-gfsib02a] : No. of heal failed entries   : 0<br>
                 <br>
                homegfs [b7-gfsib02b] : Starting time of crawl       :
                Thu Jan 21 12:53:09 2016<br>
                homegfs [b7-gfsib02b] : Ending time of crawl         :
                Thu Jan 21 12:53:09 2016<br>
                homegfs [b7-gfsib02b] : Type of crawl: INDEX<br>
                homegfs [b7-gfsib02b] : No. of entries healed        : 0<br>
                homegfs [b7-gfsib02b] : No. of entries in split-brain: 0<br>
                homegfs [b7-gfsib02b] : No. of heal failed entries   : 0<br>
                 <br>
                *** gluster bug in &#39;gluster volume heal homegfs
                statistics&#39;   ***<br>
                *** Use &#39;gluster volume heal homegfs info&#39; until bug is
                fixed ***<span><br>
                   <br>
                  Brick gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>
                  Number of entries: 0<br>
                  <br>
                  Brick gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>
                  Number of entries: 0<br>
                  <br>
                  Brick gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>
                  Number of entries: 0<br>
                  <br>
                  Brick gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>
                  Number of entries: 0<br>
                  <br>
                  Brick gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>
                </span>/users/bangell/.gconfd - Is in split-brain<br>
                <br>
                Number of entries: 1<br>
                <br>
                Brick gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>
                /users/bangell/.gconfd - Is in split-brain<br>
                <br>
                /users/bangell/.gconfd/saved_state <br>
                Number of entries: 2<span><br>
                  <br>
                  Brick gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>
                  Number of entries: 0<br>
                  <br>
                  Brick gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>
                  Number of entries: 0<br>
                  <br>
                </span></div>
              <div><br>
                <br>
              </div>
            </div>
            <div>
              <div>
                <div class="gmail_extra"><br>
                  <div class="gmail_quote">On Thu, Jan 21, 2016 at 11:10
                    AM, Pranith Kumar Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>
                    wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div bgcolor="#FFFFFF" text="#000000"><span> <br>
                          <br>
                          <div>On 01/21/2016 09:26 PM, Glomski, Patrick
                            wrote:<br>
                          </div>
                          <blockquote type="cite">
                            <div dir="ltr">
                              <div>I should mention that the problem is
                                not currently occurring and there are no
                                heals (output appended). By restarting
                                the gluster services, we can stop the
                                crawl, which lowers the load for a
                                while. Subsequent crawls seem to finish
                                properly. For what it&#39;s worth,
                                files/folders that show up in the
                                &#39;volume info&#39; output during a hung crawl
                                don&#39;t seem to be anything out of the
                                ordinary. <br>
                                <br>
                                Over the past four days, the typical
                                time before the problem recurs after
                                suppressing it in this manner is an
                                hour. Last night when we reached out to
                                you was the last time it happened and
                                the load has been low since (a relief). 
                                David believes that recursively listing
                                the files (ls -alR or similar) from a
                                client mount can force the issue to
                                happen, but obviously I&#39;d rather not
                                unless we have some precise thing we&#39;re
                                looking for. Let me know if you&#39;d like
                                me to attempt to drive the system
                                unstable like that and what I should
                                look for. As it&#39;s a production system,
                                I&#39;d rather not leave it in this state
                                for long.<br>
                              </div>
                            </div>
                          </blockquote>
                          <br>
                        </span> Will it be possible to send glustershd,
                        mount logs of the past 4 days? I would like to
                        see if this is because of directory self-heal
                        going wild (Ravi is working on throttling
                        feature for 3.8, which will allow to put breaks
                        on self-heal traffic)<span><font color="#888888"><br>
                            <br>
                            Pranith</font></span>
                        <div>
                          <div><br>
                            <blockquote type="cite">
                              <div dir="ltr">
                                <div><br>
                                </div>
                                <div>[root@gfs01a xattrop]# gluster
                                  volume heal homegfs info<br>
                                  Brick
                                  gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>
                                  Number of entries: 0<br>
                                  <br>
                                  Brick
                                  gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>
                                  Number of entries: 0<br>
                                  <br>
                                  Brick
                                  gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>
                                  Number of entries: 0<br>
                                  <br>
                                  Brick
                                  gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>
                                  Number of entries: 0<br>
                                  <br>
                                  Brick
                                  gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>
                                  Number of entries: 0<br>
                                  <br>
                                  Brick
                                  gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>
                                  Number of entries: 0<br>
                                  <br>
                                  Brick
                                  gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>
                                  Number of entries: 0<br>
                                  <br>
                                  Brick
                                  gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>
                                  Number of entries: 0<br>
                                  <br>
                                  <br>
                                  <br>
                                </div>
                              </div>
                              <div class="gmail_extra"><br>
                                <div class="gmail_quote">On Thu, Jan 21,
                                  2016 at 10:40 AM, Pranith Kumar
                                  Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>
                                  wrote:<br>
                                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                    <div bgcolor="#FFFFFF" text="#000000"><span> <br>
                                        <br>
                                        <div>On 01/21/2016 08:25 PM,
                                          Glomski, Patrick wrote:<br>
                                        </div>
                                        <blockquote type="cite">
                                          <div dir="ltr">
                                            <div>Hello, Pranith. The
                                              typical behavior is that
                                              the %cpu on a glusterfsd
                                              process jumps to number of
                                              processor cores available
                                              (800% or 1200%, depending
                                              on the pair of nodes
                                              involved) and the load
                                              average on the machine
                                              goes very high (~20). The
                                              volume&#39;s heal statistics
                                              output shows that it is
                                              crawling one of the bricks
                                              and trying to heal, but
                                              this crawl hangs and never
                                              seems to finish.<br>
                                            </div>
                                          </div>
                                        </blockquote>
                                        <blockquote type="cite">
                                          <div dir="ltr">
                                            <div><br>
                                            </div>
                                            The number of files in the
                                            xattrop directory varies
                                            over time, so I ran a wc -l
                                            as you requested
                                            periodically for some time
                                            and then started including a
                                            datestamped list of the
                                            files that were in the
                                            xattrops directory on each
                                            brick to see which were
                                            persistent. All bricks had
                                            files in the xattrop folder,
                                            so all results are attached.<br>
                                          </div>
                                        </blockquote>
                                      </span> Thanks this info is
                                      helpful. I don&#39;t see a lot of
                                      files. Could you give output of
                                      &quot;gluster volume heal
                                      &lt;volname&gt; info&quot;? Is there
                                      any directory in there which is
                                      LARGE?<span><font color="#888888"><br>
                                          <br>
                                          Pranith</font></span>
                                      <div>
                                        <div><br>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div><br>
                                              </div>
                                              <div>Please let me know if
                                                there is anything else I
                                                can provide.<br>
                                              </div>
                                              <div><br>
                                              </div>
                                              <div>Patrick<br>
                                              </div>
                                              <div><br>
                                              </div>
                                            </div>
                                            <div class="gmail_extra"><br>
                                              <div class="gmail_quote">On
                                                Thu, Jan 21, 2016 at
                                                12:01 AM, Pranith Kumar
                                                Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>
                                                wrote:<br>
                                                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                  <div bgcolor="#FFFFFF" text="#000000"> hey,<br>
                                                           Which process
                                                    is consuming so much
                                                    cpu? I went through
                                                    the logs you gave
                                                    me. I see that the
                                                    following files are
                                                    in gfid mismatch
                                                    state:<br>
                                                    <br>
&lt;066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup&gt;,<br>
&lt;1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak&gt;,<br>
&lt;ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg&gt;,<br>
                                                    <br>
                                                    Could you give me
                                                    the output of &quot;ls
                                                    &lt;brick-path&gt;/indices/xattrop
                                                    | wc -l&quot; output on
                                                    all the bricks which
                                                    are acting this way?
                                                    This will tell us
                                                    the number of
                                                    pending self-heals
                                                    on the system.<br>
                                                    <br>
                                                    Pranith
                                                    <div>
                                                      <div><br>
                                                        <br>
                                                        <div>On
                                                          01/20/2016
                                                          09:26 PM,
                                                          David Robinson
                                                          wrote:<br>
                                                        </div>
                                                      </div>
                                                    </div>
                                                    <blockquote type="cite">
                                                      <div>
                                                        <div>
                                                          <div>resending
                                                          with parsed
                                                          logs... </div>
                                                          <div> </div>
                                                          <div>
                                                          <blockquote cite="http://em5ee26b0e-002a-4230-bdec-3020b98cff3c@dfrobins-vaio" type="cite">
                                                          <div> </div>
                                                          <div> </div>
                                                          <div>
                                                          <blockquote cite="http://eme3b2cb80-8be2-4fa5-9d08-4710955e237c@dfrobins-vaio" type="cite">
                                                          <div>I am
                                                          having issues
                                                          with 3.6.6
                                                          where the load
                                                          will spike up
                                                          to 800% for
                                                          one of the
                                                          glusterfsd
                                                          processes and
                                                          the users can
                                                          no longer
                                                          access the
                                                          system.  If I
                                                          reboot the
                                                          node, the heal
                                                          will finish
                                                          normally after
                                                          a few minutes
                                                          and the system
                                                          will be
                                                          responsive,
                                                          but a few
                                                          hours later
                                                          the issue will
                                                          start again. 
                                                          It look like
                                                          it is hanging
                                                          in a heal and
                                                          spinning up
                                                          the load on
                                                          one of the
                                                          bricks.  The
                                                          heal gets
                                                          stuck and says
                                                          it is crawling
                                                          and never
                                                          returns. 
                                                          After a few
                                                          minutes of the
                                                          heal saying it
                                                          is crawling,
                                                          the load
                                                          spikes up and
                                                          the mounts
                                                          become
                                                          unresponsive.</div>
                                                          <div> </div>
                                                          <div>Any
                                                          suggestions on
                                                          how to fix
                                                          this?  It has
                                                          us stopped
                                                          cold as the
                                                          user can no
                                                          longer access
                                                          the systems
                                                          when the load
                                                          spikes... Logs
                                                          attached.</div>
                                                          <div> </div>
                                                          <div>System
                                                          setup info is:
                                                          </div>
                                                          <div> </div>
                                                          <div>[root@gfs01a
                                                          ~]# gluster
                                                          volume info
                                                          homegfs<br>
                                                           <br>
                                                          Volume Name:
                                                          homegfs<br>
                                                          Type:
                                                          Distributed-Replicate<br>
                                                          Volume ID:
                                                          1e32672a-f1b7-4b58-ba94-58c085e59071<br>
                                                          Status:
                                                          Started<br>
                                                          Number of
                                                          Bricks: 4 x 2
                                                          = 8<br>
                                                          Transport-type:
                                                          tcp<br>
                                                          Bricks:<br>
                                                          Brick1:
                                                          gfsib01a.corvidtec.com:/data/brick01a/homegfs<br>
                                                          Brick2:
                                                          gfsib01b.corvidtec.com:/data/brick01b/homegfs<br>
                                                          Brick3:
                                                          gfsib01a.corvidtec.com:/data/brick02a/homegfs<br>
                                                          Brick4:
                                                          gfsib01b.corvidtec.com:/data/brick02b/homegfs<br>
                                                          Brick5:
                                                          gfsib02a.corvidtec.com:/data/brick01a/homegfs<br>
                                                          Brick6:
                                                          gfsib02b.corvidtec.com:/data/brick01b/homegfs<br>
                                                          Brick7:
                                                          gfsib02a.corvidtec.com:/data/brick02a/homegfs<br>
                                                          Brick8:
                                                          gfsib02b.corvidtec.com:/data/brick02b/homegfs<br>
                                                          Options
                                                          Reconfigured:<br>
                                                          performance.io-thread-count:

                                                          32<br>
                                                          performance.cache-size:

                                                          128MB<br>
                                                          performance.write-behind-window-size:


                                                          128MB<br>
                                                          server.allow-insecure:
                                                          on<br>
                                                          network.ping-timeout:
                                                          42<br>
                                                          storage.owner-gid:
                                                          100<br>
                                                          geo-replication.indexing:

                                                          off<br>
                                                          geo-replication.ignore-pid-check:

                                                          on<br>
                                                          changelog.changelog:
                                                          off<br>
                                                          changelog.fsync-interval:
                                                          3<br>
                                                          changelog.rollover-time:
                                                          15<br>
                                                          server.manage-gids:
                                                          on<br>
                                                          diagnostics.client-log-level:

                                                          WARNING</div>
                                                          <div> </div>
                                                          <div>[root@gfs01a
                                                          ~]# rpm -qa |
                                                          grep gluster<br>
gluster-nagios-common-0.1.1-0.el6.noarch<br>
glusterfs-fuse-3.6.6-1.el6.x86_64<br>
glusterfs-debuginfo-3.6.6-1.el6.x86_64<br>
glusterfs-libs-3.6.6-1.el6.x86_64<br>
glusterfs-geo-replication-3.6.6-1.el6.x86_64<br>
glusterfs-api-3.6.6-1.el6.x86_64<br>
glusterfs-devel-3.6.6-1.el6.x86_64<br>
glusterfs-api-devel-3.6.6-1.el6.x86_64<br>
glusterfs-3.6.6-1.el6.x86_64<br>
glusterfs-cli-3.6.6-1.el6.x86_64<br>
glusterfs-rdma-3.6.6-1.el6.x86_64<br>
samba-vfs-glusterfs-4.1.11-2.el6.x86_64<br>
glusterfs-server-3.6.6-1.el6.x86_64<br>
glusterfs-extra-xlators-3.6.6-1.el6.x86_64<br>
                                                          </div>
                                                          <div> </div>
                                                          <div>
                                                          <div style="FONT-SIZE:12pt;FONT-FAMILY:Times New Roman"><span><span>
                                                          <div> </div>
                                                          </span></span></div>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br>
                                                          <fieldset></fieldset>
                                                          <br>
                                                        </div>
                                                      </div>
                                                      <pre>_______________________________________________
Gluster-devel mailing list
<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a>
<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>
                                                    </blockquote>
                                                    <br>
                                                  </div>
                                                  <br>
_______________________________________________<br>
                                                  Gluster-users mailing
                                                  list<br>
                                                  <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
                                                  <a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
                                                </blockquote>
                                              </div>
                                              <br>
                                            </div>
                                          </blockquote>
                                          <br>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                </div>
                                <br>
                              </div>
                            </blockquote>
                            <br>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div>