<div dir="ltr">Unfortunately, all samba mounts to the gluster volume through the gfapi vfs plugin have been disabled for the last 6 hours or so and frequency of %cpu spikes is increased. We had switched to sharing a fuse mount through samba, but I just disabled that as well. There are no samba shares of this volume now. The spikes now happen every thirty minutes or so. We&#39;ve resorted to just rebooting the machine with high load for the present.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 21, 2016 at 8:49 PM, Pranith Kumar Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000"><span class="">

    <br>

    <br>

    <div>On 01/22/2016 07:13 AM, Glomski,

      Patrick wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">We use the samba glusterfs virtual filesystem (the

        current version provided on <a href="http://download.gluster.org" target="_blank">download.gluster.org</a>),

        but no windows clients connecting directly.<br>

      </div>

    </blockquote>

    <br></span>

    Hmm.. Is there a way to disable using this and check if the CPU%

    still increases? What getxattr of &quot;glusterfs.get_real_filename

    &lt;filanme&gt;&quot; does is to scan the entire directory looking for

    strcasecmp(&lt;filname&gt;, &lt;scanned-filename&gt;). If anything

    matches then it will return the &lt;scanned-filename&gt;. But the

    problem is the scan is costly. So I wonder if this is the reason for

    the CPU spikes.<span class="HOEnZb"><font color="#888888"><br>

    <br>

    Pranith</font></span><div><div class="h5"><br>

    <blockquote type="cite">

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Thu, Jan 21, 2016 at 8:37 PM,

          Pranith Kumar Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span> wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000"> Do you have any

              windows clients? I see a lot of getxattr calls for

              &quot;glusterfs.get_real_filename&quot; which lead to full readdirs

              of the directories on the brick.<span><font color="#888888"><br>

                  <br>

                  Pranith</font></span><span><br>

                <br>

                <div>On 01/22/2016 12:51 AM, Glomski, Patrick wrote:<br>

                </div>

              </span>

              <div>

                <div>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>Pranith, could this kind of behavior be

                        self-inflicted by us deleting files directly

                        from the bricks? We have done that in the past

                        to clean up an issues where gluster wouldn&#39;t

                        allow us to delete from the mount.<br>

                        <br>

                        If so, is it feasible to clean them up by

                        running a search on the .glusterfs directories

                        directly and removing files with a reference

                        count of 1 that are non-zero size (or directly

                        checking the xattrs to be sure that it&#39;s not a

                        DHT link). <br>

                        <br>

                        find /data/brick01a/homegfs/.glusterfs -type f

                        -not -empty -links -2 -exec rm -f &quot;{}&quot; \;<br>

                        <br>

                      </div>

                      Is there anything I&#39;m inherently missing with that

                      approach that will further corrupt the system?<br>

                      <div><br>

                      </div>

                    </div>

                    <div class="gmail_extra"><br>

                      <div class="gmail_quote">On Thu, Jan 21, 2016 at

                        1:02 PM, Glomski, Patrick <span dir="ltr">&lt;<a href="mailto:patrick.glomski@corvidtec.com" target="_blank">patrick.glomski@corvidtec.com</a>&gt;</span>

                        wrote:<br>

                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                          <div dir="ltr">

                            <div>

                              <div>Load spiked again: ~1200%cpu on

                                gfs02a for glusterfsd. Crawl has been

                                running on one of the bricks on gfs02b

                                for 25 min or so and users cannot access

                                the volume.<br>

                                <br>

                                I re-listed the xattrop directories as

                                well as a &#39;top&#39; entry and heal

                                statistics. Then I restarted the gluster

                                services on gfs02a. <br>

                                <br>

                                =================== top

                                ===================<br>

                                PID USER      PR  NI  VIRT  RES  SHR S

                                %CPU %MEM    TIME+ 

                                COMMAND                                                

                                <br>

                                 8969 root      20   0 2815m 204m 3588 S

                                1181.0  0.6 591:06.93 glusterfsd        

                                <br>

                                <br>

                                =================== xattrop

                                ===================<br>

/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>

                                xattrop-41f19453-91e4-437c-afa9-3b25614de210 

xattrop-9b815879-2f4d-402b-867c-a6d65087788c<br>

                                <br>

/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>

                                xattrop-70131855-3cfb-49af-abce-9d23f57fb393 

xattrop-dfb77848-a39d-4417-a725-9beca75d78c6<br>

                                <br>

/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>

                                e6e47ed9-309b-42a7-8c44-28c29b9a20f8         

xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125<br>

                                xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 

xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0<br>

                                <br>

/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>

                                xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc 

xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413<br>

                                <br>

/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531<br>

                                <br>

/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-7e20fdb1-5224-4b9a-be06-568708526d70<br>

                                <br>

/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>

                                8034bc06-92cd-4fa5-8aaf-09039e79d2c8 

                                c9ce22ed-6d8b-471b-a111-b39e57f0b512<br>

                                94fa1d60-45ad-4341-b69c-315936b51e8d 

                                xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7<br>

                                <br>

/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d<br>

                                <br>

                                <br>

                                =================== heal stats

                                ===================<br>

                                 <br>

                                homegfs [b0-gfsib01a] : Starting time of

                                crawl       : Thu Jan 21 12:36:45 2016<br>

                                homegfs [b0-gfsib01a] : Ending time of

                                crawl         : Thu Jan 21 12:36:45 2016<br>

                                homegfs [b0-gfsib01a] : Type of crawl:

                                INDEX<br>

                                homegfs [b0-gfsib01a] : No. of entries

                                healed        : 0<br>

                                homegfs [b0-gfsib01a] : No. of entries

                                in split-brain: 0<br>

                                homegfs [b0-gfsib01a] : No. of heal

                                failed entries   : 0<br>

                                 <br>

                                homegfs [b1-gfsib01b] : Starting time of

                                crawl       : Thu Jan 21 12:36:19 2016<br>

                                homegfs [b1-gfsib01b] : Ending time of

                                crawl         : Thu Jan 21 12:36:19 2016<br>

                                homegfs [b1-gfsib01b] : Type of crawl:

                                INDEX<br>

                                homegfs [b1-gfsib01b] : No. of entries

                                healed        : 0<br>

                                homegfs [b1-gfsib01b] : No. of entries

                                in split-brain: 0<br>

                                homegfs [b1-gfsib01b] : No. of heal

                                failed entries   : 1<br>

                                 <br>

                                homegfs [b2-gfsib01a] : Starting time of

                                crawl       : Thu Jan 21 12:36:48 2016<br>

                                homegfs [b2-gfsib01a] : Ending time of

                                crawl         : Thu Jan 21 12:36:48 2016<br>

                                homegfs [b2-gfsib01a] : Type of crawl:

                                INDEX<br>

                                homegfs [b2-gfsib01a] : No. of entries

                                healed        : 0<br>

                                homegfs [b2-gfsib01a] : No. of entries

                                in split-brain: 0<br>

                                homegfs [b2-gfsib01a] : No. of heal

                                failed entries   : 0<br>

                                 <br>

                                homegfs [b3-gfsib01b] : Starting time of

                                crawl       : Thu Jan 21 12:36:47 2016<br>

                                homegfs [b3-gfsib01b] : Ending time of

                                crawl         : Thu Jan 21 12:36:47 2016<br>

                                homegfs [b3-gfsib01b] : Type of crawl:

                                INDEX<br>

                                homegfs [b3-gfsib01b] : No. of entries

                                healed        : 0<br>

                                homegfs [b3-gfsib01b] : No. of entries

                                in split-brain: 0<br>

                                homegfs [b3-gfsib01b] : No. of heal

                                failed entries   : 0<br>

                                 <br>

                                homegfs [b4-gfsib02a] : Starting time of

                                crawl       : Thu Jan 21 12:36:06 2016<br>

                                homegfs [b4-gfsib02a] : Ending time of

                                crawl         : Thu Jan 21 12:36:06 2016<br>

                                homegfs [b4-gfsib02a] : Type of crawl:

                                INDEX<br>

                                homegfs [b4-gfsib02a] : No. of entries

                                healed        : 0<br>

                                homegfs [b4-gfsib02a] : No. of entries

                                in split-brain: 0<br>

                                homegfs [b4-gfsib02a] : No. of heal

                                failed entries   : 0<br>

                                 <br>

                                homegfs [b5-gfsib02b] : Starting time of

                                crawl       : Thu Jan 21 12:13:40 2016<br>

                                homegfs [b5-gfsib02b]

                                :                                ***

                                Crawl is in progress ***<br>

                                homegfs [b5-gfsib02b] : Type of crawl:

                                INDEX<br>

                                homegfs [b5-gfsib02b] : No. of entries

                                healed        : 0<br>

                                homegfs [b5-gfsib02b] : No. of entries

                                in split-brain: 0<br>

                                homegfs [b5-gfsib02b] : No. of heal

                                failed entries   : 0<br>

                                 <br>

                                homegfs [b6-gfsib02a] : Starting time of

                                crawl       : Thu Jan 21 12:36:58 2016<br>

                                homegfs [b6-gfsib02a] : Ending time of

                                crawl         : Thu Jan 21 12:36:58 2016<br>

                                homegfs [b6-gfsib02a] : Type of crawl:

                                INDEX<br>

                                homegfs [b6-gfsib02a] : No. of entries

                                healed        : 0<br>

                                homegfs [b6-gfsib02a] : No. of entries

                                in split-brain: 0<br>

                                homegfs [b6-gfsib02a] : No. of heal

                                failed entries   : 0<br>

                                 <br>

                                homegfs [b7-gfsib02b] : Starting time of

                                crawl       : Thu Jan 21 12:36:50 2016<br>

                                homegfs [b7-gfsib02b] : Ending time of

                                crawl         : Thu Jan 21 12:36:50 2016<br>

                                homegfs [b7-gfsib02b] : Type of crawl:

                                INDEX<br>

                                homegfs [b7-gfsib02b] : No. of entries

                                healed        : 0<br>

                                homegfs [b7-gfsib02b] : No. of entries

                                in split-brain: 0<br>

                                homegfs [b7-gfsib02b] : No. of heal

                                failed entries   : 0<br>

                                <br>

                                <br>

========================================================================================<br>

                              </div>

                              I waited a few minutes for the heals to

                              finish and ran the heal statistics and

                              info again. one file is in split-brain.

                              Aside from the split-brain, the load on

                              all systems is down now and they are

                              behaving normally. glustershd.log is

                              attached. What is going on??? <br>

                              <br>

                              Thu Jan 21 12:53:50 EST 2016<br>

                               <br>

                              =================== homegfs

                              ===================<br>

                               <br>

                              homegfs [b0-gfsib01a] : Starting time of

                              crawl       : Thu Jan 21 12:53:02 2016<br>

                              homegfs [b0-gfsib01a] : Ending time of

                              crawl         : Thu Jan 21 12:53:02 2016<br>

                              homegfs [b0-gfsib01a] : Type of crawl:

                              INDEX<br>

                              homegfs [b0-gfsib01a] : No. of entries

                              healed        : 0<br>

                              homegfs [b0-gfsib01a] : No. of entries in

                              split-brain: 0<br>

                              homegfs [b0-gfsib01a] : No. of heal failed

                              entries   : 0<br>

                               <br>

                              homegfs [b1-gfsib01b] : Starting time of

                              crawl       : Thu Jan 21 12:53:38 2016<br>

                              homegfs [b1-gfsib01b] : Ending time of

                              crawl         : Thu Jan 21 12:53:38 2016<br>

                              homegfs [b1-gfsib01b] : Type of crawl:

                              INDEX<br>

                              homegfs [b1-gfsib01b] : No. of entries

                              healed        : 0<br>

                              homegfs [b1-gfsib01b] : No. of entries in

                              split-brain: 0<br>

                              homegfs [b1-gfsib01b] : No. of heal failed

                              entries   : 1<br>

                               <br>

                              homegfs [b2-gfsib01a] : Starting time of

                              crawl       : Thu Jan 21 12:53:04 2016<br>

                              homegfs [b2-gfsib01a] : Ending time of

                              crawl         : Thu Jan 21 12:53:04 2016<br>

                              homegfs [b2-gfsib01a] : Type of crawl:

                              INDEX<br>

                              homegfs [b2-gfsib01a] : No. of entries

                              healed        : 0<br>

                              homegfs [b2-gfsib01a] : No. of entries in

                              split-brain: 0<br>

                              homegfs [b2-gfsib01a] : No. of heal failed

                              entries   : 0<br>

                               <br>

                              homegfs [b3-gfsib01b] : Starting time of

                              crawl       : Thu Jan 21 12:53:04 2016<br>

                              homegfs [b3-gfsib01b] : Ending time of

                              crawl         : Thu Jan 21 12:53:04 2016<br>

                              homegfs [b3-gfsib01b] : Type of crawl:

                              INDEX<br>

                              homegfs [b3-gfsib01b] : No. of entries

                              healed        : 0<br>

                              homegfs [b3-gfsib01b] : No. of entries in

                              split-brain: 0<br>

                              homegfs [b3-gfsib01b] : No. of heal failed

                              entries   : 0<br>

                               <br>

                              homegfs [b4-gfsib02a] : Starting time of

                              crawl       : Thu Jan 21 12:53:33 2016<br>

                              homegfs [b4-gfsib02a] : Ending time of

                              crawl         : Thu Jan 21 12:53:33 2016<br>

                              homegfs [b4-gfsib02a] : Type of crawl:

                              INDEX<br>

                              homegfs [b4-gfsib02a] : No. of entries

                              healed        : 0<br>

                              homegfs [b4-gfsib02a] : No. of entries in

                              split-brain: 0<br>

                              homegfs [b4-gfsib02a] : No. of heal failed

                              entries   : 1<br>

                               <br>

                              homegfs [b5-gfsib02b] : Starting time of

                              crawl       : Thu Jan 21 12:53:14 2016<br>

                              homegfs [b5-gfsib02b] : Ending time of

                              crawl         : Thu Jan 21 12:53:15 2016<br>

                              homegfs [b5-gfsib02b] : Type of crawl:

                              INDEX<br>

                              homegfs [b5-gfsib02b] : No. of entries

                              healed        : 0<br>

                              homegfs [b5-gfsib02b] : No. of entries in

                              split-brain: 0<br>

                              homegfs [b5-gfsib02b] : No. of heal failed

                              entries   : 3<br>

                               <br>

                              homegfs [b6-gfsib02a] : Starting time of

                              crawl       : Thu Jan 21 12:53:04 2016<br>

                              homegfs [b6-gfsib02a] : Ending time of

                              crawl         : Thu Jan 21 12:53:04 2016<br>

                              homegfs [b6-gfsib02a] : Type of crawl:

                              INDEX<br>

                              homegfs [b6-gfsib02a] : No. of entries

                              healed        : 0<br>

                              homegfs [b6-gfsib02a] : No. of entries in

                              split-brain: 0<br>

                              homegfs [b6-gfsib02a] : No. of heal failed

                              entries   : 0<br>

                               <br>

                              homegfs [b7-gfsib02b] : Starting time of

                              crawl       : Thu Jan 21 12:53:09 2016<br>

                              homegfs [b7-gfsib02b] : Ending time of

                              crawl         : Thu Jan 21 12:53:09 2016<br>

                              homegfs [b7-gfsib02b] : Type of crawl:

                              INDEX<br>

                              homegfs [b7-gfsib02b] : No. of entries

                              healed        : 0<br>

                              homegfs [b7-gfsib02b] : No. of entries in

                              split-brain: 0<br>

                              homegfs [b7-gfsib02b] : No. of heal failed

                              entries   : 0<br>

                               <br>

                              *** gluster bug in &#39;gluster volume heal

                              homegfs statistics&#39;   ***<br>

                              *** Use &#39;gluster volume heal homegfs info&#39;

                              until bug is fixed ***<span><br>

                                 <br>

                                Brick

                                gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>

                                Number of entries: 0<br>

                                <br>

                                Brick

                                gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>

                                Number of entries: 0<br>

                                <br>

                                Brick

                                gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>

                                Number of entries: 0<br>

                                <br>

                                Brick

                                gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>

                                Number of entries: 0<br>

                                <br>

                                Brick

                                gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>

                              </span>/users/bangell/.gconfd - Is in

                              split-brain<br>

                              <br>

                              Number of entries: 1<br>

                              <br>

                              Brick

                              gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>

                              /users/bangell/.gconfd - Is in split-brain<br>

                              <br>

                              /users/bangell/.gconfd/saved_state <br>

                              Number of entries: 2<span><br>

                                <br>

                                Brick

                                gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>

                                Number of entries: 0<br>

                                <br>

                                Brick

                                gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>

                                Number of entries: 0<br>

                                <br>

                              </span></div>

                            <div><br>

                              <br>

                            </div>

                          </div>

                          <div>

                            <div>

                              <div class="gmail_extra"><br>

                                <div class="gmail_quote">On Thu, Jan 21,

                                  2016 at 11:10 AM, Pranith Kumar

                                  Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>

                                  wrote:<br>

                                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                    <div bgcolor="#FFFFFF" text="#000000"><span> <br>

                                        <br>

                                        <div>On 01/21/2016 09:26 PM,

                                          Glomski, Patrick wrote:<br>

                                        </div>

                                        <blockquote type="cite">

                                          <div dir="ltr">

                                            <div>I should mention that

                                              the problem is not

                                              currently occurring and

                                              there are no heals (output

                                              appended). By restarting

                                              the gluster services, we

                                              can stop the crawl, which

                                              lowers the load for a

                                              while. Subsequent crawls

                                              seem to finish properly.

                                              For what it&#39;s worth,

                                              files/folders that show up

                                              in the &#39;volume info&#39;

                                              output during a hung crawl

                                              don&#39;t seem to be anything

                                              out of the ordinary. <br>

                                              <br>

                                              Over the past four days,

                                              the typical time before

                                              the problem recurs after

                                              suppressing it in this

                                              manner is an hour. Last

                                              night when we reached out

                                              to you was the last time

                                              it happened and the load

                                              has been low since (a

                                              relief).  David believes

                                              that recursively listing

                                              the files (ls -alR or

                                              similar) from a client

                                              mount can force the issue

                                              to happen, but obviously

                                              I&#39;d rather not unless we

                                              have some precise thing

                                              we&#39;re looking for. Let me

                                              know if you&#39;d like me to

                                              attempt to drive the

                                              system unstable like that

                                              and what I should look

                                              for. As it&#39;s a production

                                              system, I&#39;d rather not

                                              leave it in this state for

                                              long.<br>

                                            </div>

                                          </div>

                                        </blockquote>

                                        <br>

                                      </span> Will it be possible to

                                      send glustershd, mount logs of the

                                      past 4 days? I would like to see

                                      if this is because of directory

                                      self-heal going wild (Ravi is

                                      working on throttling feature for

                                      3.8, which will allow to put

                                      breaks on self-heal traffic)<span><font color="#888888"><br>

                                          <br>

                                          Pranith</font></span>

                                      <div>

                                        <div><br>

                                          <blockquote type="cite">

                                            <div dir="ltr">

                                              <div><br>

                                              </div>

                                              <div>[root@gfs01a

                                                xattrop]# gluster volume

                                                heal homegfs info<br>

                                                Brick

                                                gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>

                                                Number of entries: 0<br>

                                                <br>

                                                Brick

                                                gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>

                                                Number of entries: 0<br>

                                                <br>

                                                Brick

                                                gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>

                                                Number of entries: 0<br>

                                                <br>

                                                Brick

                                                gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>

                                                Number of entries: 0<br>

                                                <br>

                                                Brick

                                                gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>

                                                Number of entries: 0<br>

                                                <br>

                                                Brick

                                                gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>

                                                Number of entries: 0<br>

                                                <br>

                                                Brick

                                                gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>

                                                Number of entries: 0<br>

                                                <br>

                                                Brick

                                                gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>

                                                Number of entries: 0<br>

                                                <br>

                                                <br>

                                                <br>

                                              </div>

                                            </div>

                                            <div class="gmail_extra"><br>

                                              <div class="gmail_quote">On

                                                Thu, Jan 21, 2016 at

                                                10:40 AM, Pranith Kumar

                                                Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>

                                                wrote:<br>

                                                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                  <div bgcolor="#FFFFFF" text="#000000"><span>

                                                      <br>

                                                      <br>

                                                      <div>On 01/21/2016

                                                        08:25 PM,

                                                        Glomski, Patrick

                                                        wrote:<br>

                                                      </div>

                                                      <blockquote type="cite">

                                                        <div dir="ltr">

                                                          <div>Hello,

                                                          Pranith. The

                                                          typical

                                                          behavior is

                                                          that the %cpu

                                                          on a

                                                          glusterfsd

                                                          process jumps

                                                          to number of

                                                          processor

                                                          cores

                                                          available

                                                          (800% or

                                                          1200%,

                                                          depending on

                                                          the pair of

                                                          nodes

                                                          involved) and

                                                          the load

                                                          average on the

                                                          machine goes

                                                          very high

                                                          (~20). The

                                                          volume&#39;s heal

                                                          statistics

                                                          output shows

                                                          that it is

                                                          crawling one

                                                          of the bricks

                                                          and trying to

                                                          heal, but this

                                                          crawl hangs

                                                          and never

                                                          seems to

                                                          finish.<br>

                                                          </div>

                                                        </div>

                                                      </blockquote>

                                                      <blockquote type="cite">

                                                        <div dir="ltr">

                                                          <div><br>

                                                          </div>

                                                          The number of

                                                          files in the

                                                          xattrop

                                                          directory

                                                          varies over

                                                          time, so I ran

                                                          a wc -l as you

                                                          requested

                                                          periodically

                                                          for some time

                                                          and then

                                                          started

                                                          including a

                                                          datestamped

                                                          list of the

                                                          files that

                                                          were in the

                                                          xattrops

                                                          directory on

                                                          each brick to

                                                          see which were

                                                          persistent.

                                                          All bricks had

                                                          files in the

                                                          xattrop

                                                          folder, so all

                                                          results are

                                                          attached.<br>

                                                        </div>

                                                      </blockquote>

                                                    </span> Thanks this

                                                    info is helpful. I

                                                    don&#39;t see a lot of

                                                    files. Could you

                                                    give output of

                                                    &quot;gluster volume heal

                                                    &lt;volname&gt;

                                                    info&quot;? Is there any

                                                    directory in there

                                                    which is LARGE?<span><font color="#888888"><br>

                                                        <br>

                                                        Pranith</font></span>

                                                    <div>

                                                      <div><br>

                                                        <blockquote type="cite">

                                                          <div dir="ltr">

                                                          <div><br>

                                                          </div>

                                                          <div>Please

                                                          let me know if

                                                          there is

                                                          anything else

                                                          I can provide.<br>

                                                          </div>

                                                          <div><br>

                                                          </div>

                                                          <div>Patrick<br>

                                                          </div>

                                                          <div><br>

                                                          </div>

                                                          </div>

                                                          <div class="gmail_extra"><br>

                                                          <div class="gmail_quote">On

                                                          Thu, Jan 21,

                                                          2016 at 12:01

                                                          AM, Pranith

                                                          Kumar

                                                          Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>

                                                          wrote:<br>

                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                                          <div bgcolor="#FFFFFF" text="#000000">

                                                          hey,<br>

                                                                 Which

                                                          process is

                                                          consuming so

                                                          much cpu? I

                                                          went through

                                                          the logs you

                                                          gave me. I see

                                                          that the

                                                          following

                                                          files are in

                                                          gfid mismatch

                                                          state:<br>

                                                          <br>

&lt;066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup&gt;,<br>

&lt;1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak&gt;,<br>

&lt;ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg&gt;,<br>

                                                          <br>

                                                          Could you give

                                                          me the output

                                                          of &quot;ls

                                                          &lt;brick-path&gt;/indices/xattrop

                                                          | wc -l&quot;

                                                          output on all

                                                          the bricks

                                                          which are

                                                          acting this

                                                          way? This will

                                                          tell us the

                                                          number of

                                                          pending

                                                          self-heals on

                                                          the system.<br>

                                                          <br>

                                                          Pranith

                                                          <div>

                                                          <div><br>

                                                          <br>

                                                          <div>On

                                                          01/20/2016

                                                          09:26 PM,

                                                          David Robinson

                                                          wrote:<br>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          <blockquote type="cite">

                                                          <div>

                                                          <div>

                                                          <div>resending

                                                          with parsed

                                                          logs... </div>

                                                          <div> </div>

                                                          <div>

                                                          <blockquote cite="http://em5ee26b0e-002a-4230-bdec-3020b98cff3c@dfrobins-vaio" type="cite">

                                                          <div> </div>

                                                          <div> </div>

                                                          <div>

                                                          <blockquote cite="http://eme3b2cb80-8be2-4fa5-9d08-4710955e237c@dfrobins-vaio" type="cite">

                                                          <div>I am

                                                          having issues

                                                          with 3.6.6

                                                          where the load

                                                          will spike up

                                                          to 800% for

                                                          one of the

                                                          glusterfsd

                                                          processes and

                                                          the users can

                                                          no longer

                                                          access the

                                                          system.  If I

                                                          reboot the

                                                          node, the heal

                                                          will finish

                                                          normally after

                                                          a few minutes

                                                          and the system

                                                          will be

                                                          responsive,

                                                          but a few

                                                          hours later

                                                          the issue will

                                                          start again. 

                                                          It look like

                                                          it is hanging

                                                          in a heal and

                                                          spinning up

                                                          the load on

                                                          one of the

                                                          bricks.  The

                                                          heal gets

                                                          stuck and says

                                                          it is crawling

                                                          and never

                                                          returns. 

                                                          After a few

                                                          minutes of the

                                                          heal saying it

                                                          is crawling,

                                                          the load

                                                          spikes up and

                                                          the mounts

                                                          become

                                                          unresponsive.</div>

                                                          <div> </div>

                                                          <div>Any

                                                          suggestions on

                                                          how to fix

                                                          this?  It has

                                                          us stopped

                                                          cold as the

                                                          user can no

                                                          longer access

                                                          the systems

                                                          when the load

                                                          spikes... Logs

                                                          attached.</div>

                                                          <div> </div>

                                                          <div>System

                                                          setup info is:

                                                          </div>

                                                          <div> </div>

                                                          <div>[root@gfs01a

                                                          ~]# gluster

                                                          volume info

                                                          homegfs<br>

                                                           <br>

                                                          Volume Name:

                                                          homegfs<br>

                                                          Type:

                                                          Distributed-Replicate<br>

                                                          Volume ID:

                                                          1e32672a-f1b7-4b58-ba94-58c085e59071<br>

                                                          Status:

                                                          Started<br>

                                                          Number of

                                                          Bricks: 4 x 2

                                                          = 8<br>

                                                          Transport-type:

                                                          tcp<br>

                                                          Bricks:<br>

                                                          Brick1:

                                                          gfsib01a.corvidtec.com:/data/brick01a/homegfs<br>

                                                          Brick2:

                                                          gfsib01b.corvidtec.com:/data/brick01b/homegfs<br>

                                                          Brick3:

                                                          gfsib01a.corvidtec.com:/data/brick02a/homegfs<br>

                                                          Brick4:

                                                          gfsib01b.corvidtec.com:/data/brick02b/homegfs<br>

                                                          Brick5:

                                                          gfsib02a.corvidtec.com:/data/brick01a/homegfs<br>

                                                          Brick6:

                                                          gfsib02b.corvidtec.com:/data/brick01b/homegfs<br>

                                                          Brick7:

                                                          gfsib02a.corvidtec.com:/data/brick02a/homegfs<br>

                                                          Brick8:

                                                          gfsib02b.corvidtec.com:/data/brick02b/homegfs<br>

                                                          Options

                                                          Reconfigured:<br>

                                                          performance.io-thread-count:

                                                          32<br>

                                                          performance.cache-size:

                                                          128MB<br>

                                                          performance.write-behind-window-size:

                                                          128MB<br>

                                                          server.allow-insecure:

                                                          on<br>

                                                          network.ping-timeout:

                                                          42<br>

                                                          storage.owner-gid:

                                                          100<br>

                                                          geo-replication.indexing:

                                                          off<br>

                                                          geo-replication.ignore-pid-check:

                                                          on<br>

                                                          changelog.changelog:

                                                          off<br>

                                                          changelog.fsync-interval:

                                                          3<br>

                                                          changelog.rollover-time:

                                                          15<br>

                                                          server.manage-gids:

                                                          on<br>

                                                          diagnostics.client-log-level:

                                                          WARNING</div>

                                                          <div> </div>

                                                          <div>[root@gfs01a

                                                          ~]# rpm -qa |

                                                          grep gluster<br>

gluster-nagios-common-0.1.1-0.el6.noarch<br>

glusterfs-fuse-3.6.6-1.el6.x86_64<br>

glusterfs-debuginfo-3.6.6-1.el6.x86_64<br>

glusterfs-libs-3.6.6-1.el6.x86_64<br>

glusterfs-geo-replication-3.6.6-1.el6.x86_64<br>

glusterfs-api-3.6.6-1.el6.x86_64<br>

glusterfs-devel-3.6.6-1.el6.x86_64<br>

glusterfs-api-devel-3.6.6-1.el6.x86_64<br>

glusterfs-3.6.6-1.el6.x86_64<br>

glusterfs-cli-3.6.6-1.el6.x86_64<br>

glusterfs-rdma-3.6.6-1.el6.x86_64<br>

samba-vfs-glusterfs-4.1.11-2.el6.x86_64<br>

glusterfs-server-3.6.6-1.el6.x86_64<br>

glusterfs-extra-xlators-3.6.6-1.el6.x86_64<br>

                                                          </div>

                                                          <div> </div>

                                                          <div>

                                                          <div style="FONT-SIZE:12pt;FONT-FAMILY:Times New Roman"><span><span>

                                                          <div> </div>

                                                          </span></span></div>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          <fieldset></fieldset>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          <pre>_______________________________________________

Gluster-devel mailing list

<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a>

<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>

                                                          </blockquote>

                                                          <br>

                                                          </div>

                                                          <br>

_______________________________________________<br>

                                                          Gluster-users

                                                          mailing list<br>

                                                          <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

                                                          <a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          </div>

                                                        </blockquote>

                                                        <br>

                                                      </div>

                                                    </div>

                                                  </div>

                                                </blockquote>

                                              </div>

                                              <br>

                                            </div>

                                          </blockquote>

                                          <br>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                </div>

                                <br>

                              </div>

                            </div>

                          </div>

                        </blockquote>

                      </div>

                      <br>

                    </div>

                  </blockquote>

                  <br>

                </div>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </div></div></div>

</blockquote></div><br></div>