<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <br>

    <div class="moz-cite-prefix">On 01/22/2016 07:25 AM, Glomski,

      Patrick wrote:<br>

    </div>

    <blockquote

cite="mid:CALkMjdDRCsoOgmdefzBkdDez1Uqt9Z4_8qiEfCXLW-oasNz5gQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">Unfortunately, all samba mounts to the gluster

        volume through the gfapi vfs plugin have been disabled for the

        last 6 hours or so and frequency of %cpu spikes is increased. We

        had switched to sharing a fuse mount through samba, but I just

        disabled that as well. There are no samba shares of this volume

        now. The spikes now happen every thirty minutes or so. We've

        resorted to just rebooting the machine with high load for the

        present.<br>

      </div>

    </blockquote>

    <br>

    Next time this CPU spike happens, could you collect samples of

    gstack &lt;pid-of-brick&gt; every second for 10-20 seconds? That

    helps in finding the heavily hit function calls.<br>

    <br>

    Something like "for i in {1..20}; do gstack &lt;pid-of-brick&gt;

    &gt; sample-$i.txt; done"<br>

    <br>

    Pranith<br>

    <blockquote

cite="mid:CALkMjdDRCsoOgmdefzBkdDez1Uqt9Z4_8qiEfCXLW-oasNz5gQ@mail.gmail.com"

      type="cite">

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Thu, Jan 21, 2016 at 8:49 PM,

          Pranith Kumar Karampuri <span dir="ltr">&lt;<a

              moz-do-not-send="true" href="mailto:pkarampu@redhat.com"

              target="_blank">pkarampu@redhat.com</a>&gt;</span> wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000"><span class=""> <br>

                <br>

                <div>On 01/22/2016 07:13 AM, Glomski, Patrick wrote:<br>

                </div>

                <blockquote type="cite">

                  <div dir="ltr">We use the samba glusterfs virtual

                    filesystem (the current version provided on <a

                      moz-do-not-send="true"

                      href="http://download.gluster.org" target="_blank">download.gluster.org</a>),

                    but no windows clients connecting directly.<br>

                  </div>

                </blockquote>

                <br>

              </span> Hmm.. Is there a way to disable using this and

              check if the CPU% still increases? What getxattr of

              "glusterfs.get_real_filename &lt;filanme&gt;" does is to

              scan the entire directory looking for

              strcasecmp(&lt;filname&gt;, &lt;scanned-filename&gt;). If

              anything matches then it will return the

              &lt;scanned-filename&gt;. But the problem is the scan is

              costly. So I wonder if this is the reason for the CPU

              spikes.<span class="HOEnZb"><font color="#888888"><br>

                  <br>

                  Pranith</font></span>

              <div>

                <div class="h5"><br>

                  <blockquote type="cite">

                    <div class="gmail_extra"><br>

                      <div class="gmail_quote">On Thu, Jan 21, 2016 at

                        8:37 PM, Pranith Kumar Karampuri <span

                          dir="ltr">&lt;<a moz-do-not-send="true"

                            href="mailto:pkarampu@redhat.com"

                            target="_blank">pkarampu@redhat.com</a>&gt;</span>

                        wrote:<br>

                        <blockquote class="gmail_quote" style="margin:0

                          0 0 .8ex;border-left:1px #ccc

                          solid;padding-left:1ex">

                          <div bgcolor="#FFFFFF" text="#000000"> Do you

                            have any windows clients? I see a lot of

                            getxattr calls for

                            "glusterfs.get_real_filename" which lead to

                            full readdirs of the directories on the

                            brick.<span><font color="#888888"><br>

                                <br>

                                Pranith</font></span><span><br>

                              <br>

                              <div>On 01/22/2016 12:51 AM, Glomski,

                                Patrick wrote:<br>

                              </div>

                            </span>

                            <div>

                              <div>

                                <blockquote type="cite">

                                  <div dir="ltr">

                                    <div>Pranith, could this kind of

                                      behavior be self-inflicted by us

                                      deleting files directly from the

                                      bricks? We have done that in the

                                      past to clean up an issues where

                                      gluster wouldn't allow us to

                                      delete from the mount.<br>

                                      <br>

                                      If so, is it feasible to clean

                                      them up by running a search on the

                                      .glusterfs directories directly

                                      and removing files with a

                                      reference count of 1 that are

                                      non-zero size (or directly

                                      checking the xattrs to be sure

                                      that it's not a DHT link). <br>

                                      <br>

                                      find

                                      /data/brick01a/homegfs/.glusterfs

                                      -type f -not -empty -links -2

                                      -exec rm -f "{}" \;<br>

                                      <br>

                                    </div>

                                    Is there anything I'm inherently

                                    missing with that approach that will

                                    further corrupt the system?<br>

                                    <div><br>

                                    </div>

                                  </div>

                                  <div class="gmail_extra"><br>

                                    <div class="gmail_quote">On Thu, Jan

                                      21, 2016 at 1:02 PM, Glomski,

                                      Patrick <span dir="ltr">&lt;<a

                                          moz-do-not-send="true"

                                          href="mailto:patrick.glomski@corvidtec.com"

                                          target="_blank">patrick.glomski@corvidtec.com</a>&gt;</span>

                                      wrote:<br>

                                      <blockquote class="gmail_quote"

                                        style="margin:0 0 0

                                        .8ex;border-left:1px #ccc

                                        solid;padding-left:1ex">

                                        <div dir="ltr">

                                          <div>

                                            <div>Load spiked again:

                                              ~1200%cpu on gfs02a for

                                              glusterfsd. Crawl has been

                                              running on one of the

                                              bricks on gfs02b for 25

                                              min or so and users cannot

                                              access the volume.<br>

                                              <br>

                                              I re-listed the xattrop

                                              directories as well as a

                                              'top' entry and heal

                                              statistics. Then I

                                              restarted the gluster

                                              services on gfs02a. <br>

                                              <br>

                                              =================== top

                                              ===================<br>

                                              PID USER      PR  NI 

                                              VIRT  RES  SHR S %CPU

                                              %MEM    TIME+ 

                                              COMMAND                                                

                                              <br>

                                               8969 root      20   0

                                              2815m 204m 3588 S 1181.0 

                                              0.6 591:06.93

                                              glusterfsd         <br>

                                              <br>

                                              ===================

                                              xattrop

                                              ===================<br>

/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-41f19453-91e4-437c-afa9-3b25614de210 

xattrop-9b815879-2f4d-402b-867c-a6d65087788c<br>

                                              <br>

/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-70131855-3cfb-49af-abce-9d23f57fb393 

xattrop-dfb77848-a39d-4417-a725-9beca75d78c6<br>

                                              <br>

/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>

e6e47ed9-309b-42a7-8c44-28c29b9a20f8         

xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125<br>

xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 

xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0<br>

                                              <br>

/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc 

xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413<br>

                                              <br>

/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531<br>

                                              <br>

/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-7e20fdb1-5224-4b9a-be06-568708526d70<br>

                                              <br>

/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>

                                              8034bc06-92cd-4fa5-8aaf-09039e79d2c8 

c9ce22ed-6d8b-471b-a111-b39e57f0b512<br>

                                              94fa1d60-45ad-4341-b69c-315936b51e8d 

xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7<br>

                                              <br>

/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>

xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d<br>

                                              <br>

                                              <br>

                                              =================== heal

                                              stats ===================<br>

                                               <br>

                                              homegfs [b0-gfsib01a] :

                                              Starting time of

                                              crawl       : Thu Jan 21

                                              12:36:45 2016<br>

                                              homegfs [b0-gfsib01a] :

                                              Ending time of

                                              crawl         : Thu Jan 21

                                              12:36:45 2016<br>

                                              homegfs [b0-gfsib01a] :

                                              Type of crawl: INDEX<br>

                                              homegfs [b0-gfsib01a] :

                                              No. of entries

                                              healed        : 0<br>

                                              homegfs [b0-gfsib01a] :

                                              No. of entries in

                                              split-brain: 0<br>

                                              homegfs [b0-gfsib01a] :

                                              No. of heal failed

                                              entries   : 0<br>

                                               <br>

                                              homegfs [b1-gfsib01b] :

                                              Starting time of

                                              crawl       : Thu Jan 21

                                              12:36:19 2016<br>

                                              homegfs [b1-gfsib01b] :

                                              Ending time of

                                              crawl         : Thu Jan 21

                                              12:36:19 2016<br>

                                              homegfs [b1-gfsib01b] :

                                              Type of crawl: INDEX<br>

                                              homegfs [b1-gfsib01b] :

                                              No. of entries

                                              healed        : 0<br>

                                              homegfs [b1-gfsib01b] :

                                              No. of entries in

                                              split-brain: 0<br>

                                              homegfs [b1-gfsib01b] :

                                              No. of heal failed

                                              entries   : 1<br>

                                               <br>

                                              homegfs [b2-gfsib01a] :

                                              Starting time of

                                              crawl       : Thu Jan 21

                                              12:36:48 2016<br>

                                              homegfs [b2-gfsib01a] :

                                              Ending time of

                                              crawl         : Thu Jan 21

                                              12:36:48 2016<br>

                                              homegfs [b2-gfsib01a] :

                                              Type of crawl: INDEX<br>

                                              homegfs [b2-gfsib01a] :

                                              No. of entries

                                              healed        : 0<br>

                                              homegfs [b2-gfsib01a] :

                                              No. of entries in

                                              split-brain: 0<br>

                                              homegfs [b2-gfsib01a] :

                                              No. of heal failed

                                              entries   : 0<br>

                                               <br>

                                              homegfs [b3-gfsib01b] :

                                              Starting time of

                                              crawl       : Thu Jan 21

                                              12:36:47 2016<br>

                                              homegfs [b3-gfsib01b] :

                                              Ending time of

                                              crawl         : Thu Jan 21

                                              12:36:47 2016<br>

                                              homegfs [b3-gfsib01b] :

                                              Type of crawl: INDEX<br>

                                              homegfs [b3-gfsib01b] :

                                              No. of entries

                                              healed        : 0<br>

                                              homegfs [b3-gfsib01b] :

                                              No. of entries in

                                              split-brain: 0<br>

                                              homegfs [b3-gfsib01b] :

                                              No. of heal failed

                                              entries   : 0<br>

                                               <br>

                                              homegfs [b4-gfsib02a] :

                                              Starting time of

                                              crawl       : Thu Jan 21

                                              12:36:06 2016<br>

                                              homegfs [b4-gfsib02a] :

                                              Ending time of

                                              crawl         : Thu Jan 21

                                              12:36:06 2016<br>

                                              homegfs [b4-gfsib02a] :

                                              Type of crawl: INDEX<br>

                                              homegfs [b4-gfsib02a] :

                                              No. of entries

                                              healed        : 0<br>

                                              homegfs [b4-gfsib02a] :

                                              No. of entries in

                                              split-brain: 0<br>

                                              homegfs [b4-gfsib02a] :

                                              No. of heal failed

                                              entries   : 0<br>

                                               <br>

                                              homegfs [b5-gfsib02b] :

                                              Starting time of

                                              crawl       : Thu Jan 21

                                              12:13:40 2016<br>

                                              homegfs [b5-gfsib02b]

                                              :                               

                                              *** Crawl is in progress

                                              ***<br>

                                              homegfs [b5-gfsib02b] :

                                              Type of crawl: INDEX<br>

                                              homegfs [b5-gfsib02b] :

                                              No. of entries

                                              healed        : 0<br>

                                              homegfs [b5-gfsib02b] :

                                              No. of entries in

                                              split-brain: 0<br>

                                              homegfs [b5-gfsib02b] :

                                              No. of heal failed

                                              entries   : 0<br>

                                               <br>

                                              homegfs [b6-gfsib02a] :

                                              Starting time of

                                              crawl       : Thu Jan 21

                                              12:36:58 2016<br>

                                              homegfs [b6-gfsib02a] :

                                              Ending time of

                                              crawl         : Thu Jan 21

                                              12:36:58 2016<br>

                                              homegfs [b6-gfsib02a] :

                                              Type of crawl: INDEX<br>

                                              homegfs [b6-gfsib02a] :

                                              No. of entries

                                              healed        : 0<br>

                                              homegfs [b6-gfsib02a] :

                                              No. of entries in

                                              split-brain: 0<br>

                                              homegfs [b6-gfsib02a] :

                                              No. of heal failed

                                              entries   : 0<br>

                                               <br>

                                              homegfs [b7-gfsib02b] :

                                              Starting time of

                                              crawl       : Thu Jan 21

                                              12:36:50 2016<br>

                                              homegfs [b7-gfsib02b] :

                                              Ending time of

                                              crawl         : Thu Jan 21

                                              12:36:50 2016<br>

                                              homegfs [b7-gfsib02b] :

                                              Type of crawl: INDEX<br>

                                              homegfs [b7-gfsib02b] :

                                              No. of entries

                                              healed        : 0<br>

                                              homegfs [b7-gfsib02b] :

                                              No. of entries in

                                              split-brain: 0<br>

                                              homegfs [b7-gfsib02b] :

                                              No. of heal failed

                                              entries   : 0<br>

                                              <br>

                                              <br>

========================================================================================<br>

                                            </div>

                                            I waited a few minutes for

                                            the heals to finish and ran

                                            the heal statistics and info

                                            again. one file is in

                                            split-brain. Aside from the

                                            split-brain, the load on all

                                            systems is down now and they

                                            are behaving normally.

                                            glustershd.log is attached.

                                            What is going on??? <br>

                                            <br>

                                            Thu Jan 21 12:53:50 EST 2016<br>

                                             <br>

                                            =================== homegfs

                                            ===================<br>

                                             <br>

                                            homegfs [b0-gfsib01a] :

                                            Starting time of crawl      

                                            : Thu Jan 21 12:53:02 2016<br>

                                            homegfs [b0-gfsib01a] :

                                            Ending time of crawl        

                                            : Thu Jan 21 12:53:02 2016<br>

                                            homegfs [b0-gfsib01a] : Type

                                            of crawl: INDEX<br>

                                            homegfs [b0-gfsib01a] : No.

                                            of entries healed        : 0<br>

                                            homegfs [b0-gfsib01a] : No.

                                            of entries in split-brain: 0<br>

                                            homegfs [b0-gfsib01a] : No.

                                            of heal failed entries   : 0<br>

                                             <br>

                                            homegfs [b1-gfsib01b] :

                                            Starting time of crawl      

                                            : Thu Jan 21 12:53:38 2016<br>

                                            homegfs [b1-gfsib01b] :

                                            Ending time of crawl        

                                            : Thu Jan 21 12:53:38 2016<br>

                                            homegfs [b1-gfsib01b] : Type

                                            of crawl: INDEX<br>

                                            homegfs [b1-gfsib01b] : No.

                                            of entries healed        : 0<br>

                                            homegfs [b1-gfsib01b] : No.

                                            of entries in split-brain: 0<br>

                                            homegfs [b1-gfsib01b] : No.

                                            of heal failed entries   : 1<br>

                                             <br>

                                            homegfs [b2-gfsib01a] :

                                            Starting time of crawl      

                                            : Thu Jan 21 12:53:04 2016<br>

                                            homegfs [b2-gfsib01a] :

                                            Ending time of crawl        

                                            : Thu Jan 21 12:53:04 2016<br>

                                            homegfs [b2-gfsib01a] : Type

                                            of crawl: INDEX<br>

                                            homegfs [b2-gfsib01a] : No.

                                            of entries healed        : 0<br>

                                            homegfs [b2-gfsib01a] : No.

                                            of entries in split-brain: 0<br>

                                            homegfs [b2-gfsib01a] : No.

                                            of heal failed entries   : 0<br>

                                             <br>

                                            homegfs [b3-gfsib01b] :

                                            Starting time of crawl      

                                            : Thu Jan 21 12:53:04 2016<br>

                                            homegfs [b3-gfsib01b] :

                                            Ending time of crawl        

                                            : Thu Jan 21 12:53:04 2016<br>

                                            homegfs [b3-gfsib01b] : Type

                                            of crawl: INDEX<br>

                                            homegfs [b3-gfsib01b] : No.

                                            of entries healed        : 0<br>

                                            homegfs [b3-gfsib01b] : No.

                                            of entries in split-brain: 0<br>

                                            homegfs [b3-gfsib01b] : No.

                                            of heal failed entries   : 0<br>

                                             <br>

                                            homegfs [b4-gfsib02a] :

                                            Starting time of crawl      

                                            : Thu Jan 21 12:53:33 2016<br>

                                            homegfs [b4-gfsib02a] :

                                            Ending time of crawl        

                                            : Thu Jan 21 12:53:33 2016<br>

                                            homegfs [b4-gfsib02a] : Type

                                            of crawl: INDEX<br>

                                            homegfs [b4-gfsib02a] : No.

                                            of entries healed        : 0<br>

                                            homegfs [b4-gfsib02a] : No.

                                            of entries in split-brain: 0<br>

                                            homegfs [b4-gfsib02a] : No.

                                            of heal failed entries   : 1<br>

                                             <br>

                                            homegfs [b5-gfsib02b] :

                                            Starting time of crawl      

                                            : Thu Jan 21 12:53:14 2016<br>

                                            homegfs [b5-gfsib02b] :

                                            Ending time of crawl        

                                            : Thu Jan 21 12:53:15 2016<br>

                                            homegfs [b5-gfsib02b] : Type

                                            of crawl: INDEX<br>

                                            homegfs [b5-gfsib02b] : No.

                                            of entries healed        : 0<br>

                                            homegfs [b5-gfsib02b] : No.

                                            of entries in split-brain: 0<br>

                                            homegfs [b5-gfsib02b] : No.

                                            of heal failed entries   : 3<br>

                                             <br>

                                            homegfs [b6-gfsib02a] :

                                            Starting time of crawl      

                                            : Thu Jan 21 12:53:04 2016<br>

                                            homegfs [b6-gfsib02a] :

                                            Ending time of crawl        

                                            : Thu Jan 21 12:53:04 2016<br>

                                            homegfs [b6-gfsib02a] : Type

                                            of crawl: INDEX<br>

                                            homegfs [b6-gfsib02a] : No.

                                            of entries healed        : 0<br>

                                            homegfs [b6-gfsib02a] : No.

                                            of entries in split-brain: 0<br>

                                            homegfs [b6-gfsib02a] : No.

                                            of heal failed entries   : 0<br>

                                             <br>

                                            homegfs [b7-gfsib02b] :

                                            Starting time of crawl      

                                            : Thu Jan 21 12:53:09 2016<br>

                                            homegfs [b7-gfsib02b] :

                                            Ending time of crawl        

                                            : Thu Jan 21 12:53:09 2016<br>

                                            homegfs [b7-gfsib02b] : Type

                                            of crawl: INDEX<br>

                                            homegfs [b7-gfsib02b] : No.

                                            of entries healed        : 0<br>

                                            homegfs [b7-gfsib02b] : No.

                                            of entries in split-brain: 0<br>

                                            homegfs [b7-gfsib02b] : No.

                                            of heal failed entries   : 0<br>

                                             <br>

                                            *** gluster bug in 'gluster

                                            volume heal homegfs

                                            statistics'   ***<br>

                                            *** Use 'gluster volume heal

                                            homegfs info' until bug is

                                            fixed ***<span><br>

                                               <br>

                                              Brick

                                              gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>

                                              Number of entries: 0<br>

                                              <br>

                                              Brick

                                              gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>

                                              Number of entries: 0<br>

                                              <br>

                                              Brick

                                              gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>

                                              Number of entries: 0<br>

                                              <br>

                                              Brick

                                              gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>

                                              Number of entries: 0<br>

                                              <br>

                                              Brick

                                              gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>

                                            </span>/users/bangell/.gconfd

                                            - Is in split-brain<br>

                                            <br>

                                            Number of entries: 1<br>

                                            <br>

                                            Brick

                                            gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>

                                            /users/bangell/.gconfd - Is

                                            in split-brain<br>

                                            <br>

                                            /users/bangell/.gconfd/saved_state

                                            <br>

                                            Number of entries: 2<span><br>

                                              <br>

                                              Brick

                                              gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>

                                              Number of entries: 0<br>

                                              <br>

                                              Brick

                                              gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>

                                              Number of entries: 0<br>

                                              <br>

                                            </span></div>

                                          <div><br>

                                            <br>

                                          </div>

                                        </div>

                                        <div>

                                          <div>

                                            <div class="gmail_extra"><br>

                                              <div class="gmail_quote">On

                                                Thu, Jan 21, 2016 at

                                                11:10 AM, Pranith Kumar

                                                Karampuri <span

                                                  dir="ltr">&lt;<a

                                                    moz-do-not-send="true"

href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>

                                                wrote:<br>

                                                <blockquote

                                                  class="gmail_quote"

                                                  style="margin:0 0 0

                                                  .8ex;border-left:1px

                                                  #ccc

                                                  solid;padding-left:1ex">

                                                  <div bgcolor="#FFFFFF"

                                                    text="#000000"><span>

                                                      <br>

                                                      <br>

                                                      <div>On 01/21/2016

                                                        09:26 PM,

                                                        Glomski, Patrick

                                                        wrote:<br>

                                                      </div>

                                                      <blockquote

                                                        type="cite">

                                                        <div dir="ltr">

                                                          <div>I should

                                                          mention that

                                                          the problem is

                                                          not currently

                                                          occurring and

                                                          there are no

                                                          heals (output

                                                          appended). By

                                                          restarting the

                                                          gluster

                                                          services, we

                                                          can stop the

                                                          crawl, which

                                                          lowers the

                                                          load for a

                                                          while.

                                                          Subsequent

                                                          crawls seem to

                                                          finish

                                                          properly. For

                                                          what it's

                                                          worth,

                                                          files/folders

                                                          that show up

                                                          in the 'volume

                                                          info' output

                                                          during a hung

                                                          crawl don't

                                                          seem to be

                                                          anything out

                                                          of the

                                                          ordinary. <br>

                                                          <br>

                                                          Over the past

                                                          four days, the

                                                          typical time

                                                          before the

                                                          problem recurs

                                                          after

                                                          suppressing it

                                                          in this manner

                                                          is an hour.

                                                          Last night

                                                          when we

                                                          reached out to

                                                          you was the

                                                          last time it

                                                          happened and

                                                          the load has

                                                          been low since

                                                          (a relief). 

                                                          David believes

                                                          that

                                                          recursively

                                                          listing the

                                                          files (ls -alR

                                                          or similar)

                                                          from a client

                                                          mount can

                                                          force the

                                                          issue to

                                                          happen, but

                                                          obviously I'd

                                                          rather not

                                                          unless we have

                                                          some precise

                                                          thing we're

                                                          looking for.

                                                          Let me know if

                                                          you'd like me

                                                          to attempt to

                                                          drive the

                                                          system

                                                          unstable like

                                                          that and what

                                                          I should look

                                                          for. As it's a

                                                          production

                                                          system, I'd

                                                          rather not

                                                          leave it in

                                                          this state for

                                                          long.<br>

                                                          </div>

                                                        </div>

                                                      </blockquote>

                                                      <br>

                                                    </span> Will it be

                                                    possible to send

                                                    glustershd, mount

                                                    logs of the past 4

                                                    days? I would like

                                                    to see if this is

                                                    because of directory

                                                    self-heal going wild

                                                    (Ravi is working on

                                                    throttling feature

                                                    for 3.8, which will

                                                    allow to put breaks

                                                    on self-heal

                                                    traffic)<span><font

                                                        color="#888888"><br>

                                                        <br>

                                                        Pranith</font></span>

                                                    <div>

                                                      <div><br>

                                                        <blockquote

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div><br>

                                                          </div>

                                                          <div>[root@gfs01a

                                                          xattrop]#

                                                          gluster volume

                                                          heal homegfs

                                                          info<br>

                                                          Brick

                                                          gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          Brick

                                                          gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>

                                                          Number of

                                                          entries: 0<br>

                                                          <br>

                                                          <br>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          <div

                                                          class="gmail_extra"><br>

                                                          <div

                                                          class="gmail_quote">On

                                                          Thu, Jan 21,

                                                          2016 at 10:40

                                                          AM, Pranith

                                                          Kumar

                                                          Karampuri <span

                                                          dir="ltr">&lt;<a

moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>

                                                          wrote:<br>

                                                          <blockquote

                                                          class="gmail_quote"

                                                          style="margin:0

                                                          0 0

                                                          .8ex;border-left:1px

                                                          #ccc

                                                          solid;padding-left:1ex">

                                                          <div

                                                          bgcolor="#FFFFFF"

                                                          text="#000000"><span>

                                                          <br>

                                                          <br>

                                                          <div>On

                                                          01/21/2016

                                                          08:25 PM,

                                                          Glomski,

                                                          Patrick wrote:<br>

                                                          </div>

                                                          <blockquote

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div>Hello,

                                                          Pranith. The

                                                          typical

                                                          behavior is

                                                          that the %cpu

                                                          on a

                                                          glusterfsd

                                                          process jumps

                                                          to number of

                                                          processor

                                                          cores

                                                          available

                                                          (800% or

                                                          1200%,

                                                          depending on

                                                          the pair of

                                                          nodes

                                                          involved) and

                                                          the load

                                                          average on the

                                                          machine goes

                                                          very high

                                                          (~20). The

                                                          volume's heal

                                                          statistics

                                                          output shows

                                                          that it is

                                                          crawling one

                                                          of the bricks

                                                          and trying to

                                                          heal, but this

                                                          crawl hangs

                                                          and never

                                                          seems to

                                                          finish.<br>

                                                          </div>

                                                          </div>

                                                          </blockquote>

                                                          <blockquote

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div><br>

                                                          </div>

                                                          The number of

                                                          files in the

                                                          xattrop

                                                          directory

                                                          varies over

                                                          time, so I ran

                                                          a wc -l as you

                                                          requested

                                                          periodically

                                                          for some time

                                                          and then

                                                          started

                                                          including a

                                                          datestamped

                                                          list of the

                                                          files that

                                                          were in the

                                                          xattrops

                                                          directory on

                                                          each brick to

                                                          see which were

                                                          persistent.

                                                          All bricks had

                                                          files in the

                                                          xattrop

                                                          folder, so all

                                                          results are

                                                          attached.<br>

                                                          </div>

                                                          </blockquote>

                                                          </span> Thanks

                                                          this info is

                                                          helpful. I

                                                          don't see a

                                                          lot of files.

                                                          Could you give

                                                          output of

                                                          "gluster

                                                          volume heal

                                                          &lt;volname&gt;

                                                          info"? Is

                                                          there any

                                                          directory in

                                                          there which is

                                                          LARGE?<span><font

color="#888888"><br>

                                                          <br>

                                                          Pranith</font></span>

                                                          <div>

                                                          <div><br>

                                                          <blockquote

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div><br>

                                                          </div>

                                                          <div>Please

                                                          let me know if

                                                          there is

                                                          anything else

                                                          I can provide.<br>

                                                          </div>

                                                          <div><br>

                                                          </div>

                                                          <div>Patrick<br>

                                                          </div>

                                                          <div><br>

                                                          </div>

                                                          </div>

                                                          <div

                                                          class="gmail_extra"><br>

                                                          <div

                                                          class="gmail_quote">On

                                                          Thu, Jan 21,

                                                          2016 at 12:01

                                                          AM, Pranith

                                                          Kumar

                                                          Karampuri <span

                                                          dir="ltr">&lt;<a

moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>

                                                          wrote:<br>

                                                          <blockquote

                                                          class="gmail_quote"

                                                          style="margin:0

                                                          0 0

                                                          .8ex;border-left:1px

                                                          #ccc

                                                          solid;padding-left:1ex">

                                                          <div

                                                          bgcolor="#FFFFFF"

                                                          text="#000000">

                                                          hey,<br>

                                                                 Which

                                                          process is

                                                          consuming so

                                                          much cpu? I

                                                          went through

                                                          the logs you

                                                          gave me. I see

                                                          that the

                                                          following

                                                          files are in

                                                          gfid mismatch

                                                          state:<br>

                                                          <br>

&lt;066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup&gt;,<br>

&lt;1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak&gt;,<br>

&lt;ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg&gt;,<br>

                                                          <br>

                                                          Could you give

                                                          me the output

                                                          of "ls

                                                          &lt;brick-path&gt;/indices/xattrop

                                                          | wc -l"

                                                          output on all

                                                          the bricks

                                                          which are

                                                          acting this

                                                          way? This will

                                                          tell us the

                                                          number of

                                                          pending

                                                          self-heals on

                                                          the system.<br>

                                                          <br>

                                                          Pranith

                                                          <div>

                                                          <div><br>

                                                          <br>

                                                          <div>On

                                                          01/20/2016

                                                          09:26 PM,

                                                          David Robinson

                                                          wrote:<br>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          <blockquote

                                                          type="cite">

                                                          <div>

                                                          <div>

                                                          <div>resending

                                                          with parsed

                                                          logs... </div>

                                                          <div> </div>

                                                          <div>

                                                          <blockquote

                                                          cite="http://em5ee26b0e-002a-4230-bdec-3020b98cff3c@dfrobins-vaio"

                                                          type="cite">

                                                          <div> </div>

                                                          <div> </div>

                                                          <div>

                                                          <blockquote

                                                          cite="http://eme3b2cb80-8be2-4fa5-9d08-4710955e237c@dfrobins-vaio"

                                                          type="cite">

                                                          <div>I am

                                                          having issues

                                                          with 3.6.6

                                                          where the load

                                                          will spike up

                                                          to 800% for

                                                          one of the

                                                          glusterfsd

                                                          processes and

                                                          the users can

                                                          no longer

                                                          access the

                                                          system.  If I

                                                          reboot the

                                                          node, the heal

                                                          will finish

                                                          normally after

                                                          a few minutes

                                                          and the system

                                                          will be

                                                          responsive,

                                                          but a few

                                                          hours later

                                                          the issue will

                                                          start again. 

                                                          It look like

                                                          it is hanging

                                                          in a heal and

                                                          spinning up

                                                          the load on

                                                          one of the

                                                          bricks.  The

                                                          heal gets

                                                          stuck and says

                                                          it is crawling

                                                          and never

                                                          returns. 

                                                          After a few

                                                          minutes of the

                                                          heal saying it

                                                          is crawling,

                                                          the load

                                                          spikes up and

                                                          the mounts

                                                          become

                                                          unresponsive.</div>

                                                          <div> </div>

                                                          <div>Any

                                                          suggestions on

                                                          how to fix

                                                          this?  It has

                                                          us stopped

                                                          cold as the

                                                          user can no

                                                          longer access

                                                          the systems

                                                          when the load

                                                          spikes... Logs

                                                          attached.</div>

                                                          <div> </div>

                                                          <div>System

                                                          setup info is:

                                                          </div>

                                                          <div> </div>

                                                          <div>[root@gfs01a

                                                          ~]# gluster

                                                          volume info

                                                          homegfs<br>

                                                           <br>

                                                          Volume Name:

                                                          homegfs<br>

                                                          Type:

                                                          Distributed-Replicate<br>

                                                          Volume ID:

                                                          1e32672a-f1b7-4b58-ba94-58c085e59071<br>

                                                          Status:

                                                          Started<br>

                                                          Number of

                                                          Bricks: 4 x 2

                                                          = 8<br>

                                                          Transport-type:

                                                          tcp<br>

                                                          Bricks:<br>

                                                          Brick1:

                                                          gfsib01a.corvidtec.com:/data/brick01a/homegfs<br>

                                                          Brick2:

                                                          gfsib01b.corvidtec.com:/data/brick01b/homegfs<br>

                                                          Brick3:

                                                          gfsib01a.corvidtec.com:/data/brick02a/homegfs<br>

                                                          Brick4:

                                                          gfsib01b.corvidtec.com:/data/brick02b/homegfs<br>

                                                          Brick5:

                                                          gfsib02a.corvidtec.com:/data/brick01a/homegfs<br>

                                                          Brick6:

                                                          gfsib02b.corvidtec.com:/data/brick01b/homegfs<br>

                                                          Brick7:

                                                          gfsib02a.corvidtec.com:/data/brick02a/homegfs<br>

                                                          Brick8:

                                                          gfsib02b.corvidtec.com:/data/brick02b/homegfs<br>

                                                          Options

                                                          Reconfigured:<br>

                                                          performance.io-thread-count:

                                                          32<br>

                                                          performance.cache-size:

                                                          128MB<br>

                                                          performance.write-behind-window-size:

                                                          128MB<br>

                                                          server.allow-insecure:

                                                          on<br>

                                                          network.ping-timeout:

                                                          42<br>

                                                          storage.owner-gid:

                                                          100<br>

                                                          geo-replication.indexing:

                                                          off<br>

                                                          geo-replication.ignore-pid-check:

                                                          on<br>

                                                          changelog.changelog:

                                                          off<br>

                                                          changelog.fsync-interval:

                                                          3<br>

                                                          changelog.rollover-time:

                                                          15<br>

                                                          server.manage-gids:

                                                          on<br>

                                                          diagnostics.client-log-level:

                                                          WARNING</div>

                                                          <div> </div>

                                                          <div>[root@gfs01a

                                                          ~]# rpm -qa |

                                                          grep gluster<br>

gluster-nagios-common-0.1.1-0.el6.noarch<br>

glusterfs-fuse-3.6.6-1.el6.x86_64<br>

glusterfs-debuginfo-3.6.6-1.el6.x86_64<br>

glusterfs-libs-3.6.6-1.el6.x86_64<br>

glusterfs-geo-replication-3.6.6-1.el6.x86_64<br>

glusterfs-api-3.6.6-1.el6.x86_64<br>

glusterfs-devel-3.6.6-1.el6.x86_64<br>

glusterfs-api-devel-3.6.6-1.el6.x86_64<br>

glusterfs-3.6.6-1.el6.x86_64<br>

glusterfs-cli-3.6.6-1.el6.x86_64<br>

glusterfs-rdma-3.6.6-1.el6.x86_64<br>

samba-vfs-glusterfs-4.1.11-2.el6.x86_64<br>

glusterfs-server-3.6.6-1.el6.x86_64<br>

glusterfs-extra-xlators-3.6.6-1.el6.x86_64<br>

                                                          </div>

                                                          <div> </div>

                                                          <div>

                                                          <div

                                                          style="FONT-SIZE:12pt;FONT-FAMILY:Times

                                                          New Roman"><span><span>

                                                          <div> </div>

                                                          </span></span></div>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          <fieldset></fieldset>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          <pre>_______________________________________________

Gluster-devel mailing list

<a moz-do-not-send="true" href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a>

<a moz-do-not-send="true" href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>

                                                          </blockquote>

                                                          <br>

                                                          </div>

                                                          <br>

_______________________________________________<br>

                                                          Gluster-users

                                                          mailing list<br>

                                                          <a

                                                          moz-do-not-send="true"

href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

                                                          <a

                                                          moz-do-not-send="true"

href="http://www.gluster.org/mailman/listinfo/gluster-users"

                                                          rel="noreferrer"

target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          </div>

                                                          </blockquote>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          </div>

                                                        </blockquote>

                                                        <br>

                                                      </div>

                                                    </div>

                                                  </div>

                                                </blockquote>

                                              </div>

                                              <br>

                                            </div>

                                          </div>

                                        </div>

                                      </blockquote>

                                    </div>

                                    <br>

                                  </div>

                                </blockquote>

                                <br>

                              </div>

                            </div>

                          </div>

                        </blockquote>

                      </div>

                      <br>

                    </div>

                  </blockquote>

                  <br>

                </div>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </body>

</html>