<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <div class="moz-cite-prefix">On 01/17/2015 05:28 AM, Kyle Harris

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAO5ZC7GQPX75g0ttCcrKu5z8YNfG3Rid904Ei6yWyW11NVJXJQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>Hello,</div>

        <div><br>

        </div>

        <div>I created a post a few days ago named "Turning Off Self

          Heal Options Don't Appear Work?" which can be found at the

          following link: &nbsp;<a moz-do-not-send="true"

href="http://www.gluster.org/pipermail/gluster-users/2015-January/020114.html">http://www.gluster.org/pipermail/gluster-users/2015-January/020114.html</a></div>

        <div><br>

        </div>

        <div>I never got a response so I decided to set up a test in a

          lab environment.&nbsp; I am able to reproduce the same thing so I'm

          hoping someone can help me.</div>

        <div><br>

        </div>

        <div>I have discovered over time that if a single node in a

          3-node replicated cluster with many small files is off for any

          length of time, when it comes back on-line, it does a great

          deal of self-healing that can cause the glusterfs and

          glusterfsd processes to spike on the machines to a degree that

          makes them unusable.&nbsp; I only have one volume, with a client

          mount on each server where it hosts many websites running

          PHP.&nbsp; All is fine until the healing process goes into

          overdrive.</div>

        <div><br>

        </div>

        <div>So, I attempted to turn off self-healing by setting the

          following three settings:</div>

        <div>

          <div>gluster volume set gv0 cluster.data-self-heal off</div>

          <div>gluster volume set gv0 cluster.entry-self-heal off</div>

          <div>gluster volume set gv0 cluster.metadata-self-heal off</div>

        </div>

      </div>

    </blockquote>

    hi Kyle,<br>

    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Krutika wanted to send a response to you today, but we spent

    the whole day debugging a bug. Let me answer some of the things we

    already discussed on behalf of Krutika.<br>

    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Krutika (CCed) has found one issue where even when some of

    the options are turned off, self-heal was still triggered. But if

    all the options are turned off I think it wouldn't do any heals from

    the mount process. But glustershd can still do heals. To disable

    that healing, we need to turn off self-heal-daemon using 'gluster

    volume set &lt;volname&gt; self-heal-daemon off'<br>

    <blockquote

cite="mid:CAO5ZC7GQPX75g0ttCcrKu5z8YNfG3Rid904Ei6yWyW11NVJXJQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Note that I would rather not set gv0

          cluster.self-heal-daemon off as then I can't see what needs

          healing such that I can do it at a later time.&nbsp; Those settings

          appear to have no affect at all.<br>

        </div>

      </div>

    </blockquote>

    Ah! 3.6.2 will be able to give the output of 'gluster volume heal

    &lt;volname&gt; info' output even when self-heal-daemon is turned

    off.<br>

    <blockquote

cite="mid:CAO5ZC7GQPX75g0ttCcrKu5z8YNfG3Rid904Ei6yWyW11NVJXJQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Here is how I reproduced this in my lab:</div>

        <div><br>

        </div>

        <div>Output from "gluster volume info gv0":<br>

        </div>

        <div>

          <div>Volume Name: gv0<br>

          </div>

          <div>Type: Replicate</div>

          <div>Volume ID: a55f8619-0789-4a1c-9cda-a903bc908fd1</div>

          <div>Status: Started</div>

          <div>Number of Bricks: 1 x 3 = 3</div>

          <div>Transport-type: tcp</div>

          <div>Bricks:</div>

          <div>Brick1: 192.168.1.116:/export/brick1</div>

          <div>Brick2: 192.168.1.140:/export/brick1</div>

          <div>Brick3: 192.168.1.123:/export/brick1</div>

          <div>Options Reconfigured:</div>

          <div>cluster.metadata-self-heal: off</div>

          <div>cluster.entry-self-heal: off</div>

          <div>cluster.data-self-heal: off</div>

        </div>

      </div>

    </blockquote>

    <blockquote

cite="mid:CAO5ZC7GQPX75g0ttCcrKu5z8YNfG3Rid904Ei6yWyW11NVJXJQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>This was done using the latest version of gluster as of

          this writing, v3.6.1 installed on CentOS 6.6 using the rpms

          available from the gluster web site.</div>

        <div><br>

        </div>

        <div>Here is how I tested:</div>

        <div>- With all 3 nodes up, I put 4 simple text files on the

          cluster</div>

        <div>- I then turned one node off</div>

        <div>- Next I made a change to 2 of the text files</div>

        <div>- Then I brought the previously turned off node back up</div>

        <div><br>

        </div>

        <div>Upon doing so, I see far more than 2 of the following

          message in the glusterhd.log:</div>

        <div><br>

        </div>

        <div>[2015-01-15 23:19:30.471384] I

          [afr-self-heal-entry.c:545:afr_selfheal_entry_do]

          0-gv0-replicate-0: performing entry selfheal on

          00000000-0000-0000-0000-000000000001</div>

        <div>[2015-01-15 23:19:30.494714] I

          [afr-self-heal-common.c:476:afr_log_selfheal]

          0-gv0-replicate-0: Completed entry selfheal on

          00000000-0000-0000-0000-000000000001. source=0 sinks=</div>

        <div><br>

        </div>

        <div>Questions:</div>

        <div>- So is this a bug?<br>

        </div>

      </div>

    </blockquote>

    The log seems to suggest that it didn't find any 'sinks' to heal to

    so it wouldn't have done any file creation/deletions. May be we

    should fix the log or see if there is more to that bug.<br>

    <blockquote

cite="mid:CAO5ZC7GQPX75g0ttCcrKu5z8YNfG3Rid904Ei6yWyW11NVJXJQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>- Why am I seeing "entry selfheal" messaages when this

          feature is supposed to be turned off?</div>

      </div>

    </blockquote>

    Because glustershd can still do self-heals as wel didn't disable it?<br>

    <blockquote

cite="mid:CAO5ZC7GQPX75g0ttCcrKu5z8YNfG3Rid904Ei6yWyW11NVJXJQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>- Also, why am I seeing far more selfheal messages than 2

          when I only changed 2 files while the single node was down?</div>

      </div>

    </blockquote>

    At the moment, I believe they are just log messages and not really

    heals. But we will need to look further and find if there is more to

    it.<br>

    <blockquote

cite="mid:CAO5ZC7GQPX75g0ttCcrKu5z8YNfG3Rid904Ei6yWyW11NVJXJQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>- Finally, how do I really turn off these selfheals that

          are taking place without completely turning off the

          cluster.self-heal-daemon for reasons mentioned above?</div>

      </div>

    </blockquote>

    There are 2 workarounds until 3.6.2 is released for this:<br>

    1) As a workaround may be we can turn self-heal-daemon off. When we

    want to see the files that need healing, we can turn it on, see the

    information and turn it off immediately. This broken functionality

    made it to 3.6.1 because I couldn't re-implement the feature for

    afrv2 in time for the release. Sorry about that!<br>

    <br>

    2) Other way to do it is to inspect the gfids of the files that need

    heal directly by looking at the directory

    &lt;brick-path&gt;/.glusterfs/indices/xattrop. This is where

    self-heal-daemon looks at and finds the files that need healing.<br>

    <br>

    You were saying you know a way to make machines unusable by

    triggering self-heals. It would be very good if we can replicate

    that test in our labs. Wondering if you have any pointers for us to

    do the same.<br>

    <br>

    Pranith<br>

    <blockquote

cite="mid:CAO5ZC7GQPX75g0ttCcrKu5z8YNfG3Rid904Ei6yWyW11NVJXJQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Thank you for any insight you may be able to provide on

          this.</div>

        <div><br>

        </div>

        -- <br>

        <div class="gmail_signature">

          <div dir="ltr">Kyle&nbsp;</div>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Gluster-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>

    </blockquote>

    <br>

  </body>

</html>