<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <br>

    <div class="moz-cite-prefix">On 08/07/2015 12:11 PM, Prasun Gera

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAFLz+Bk7Ot6w7-BRm85o7rrczyUDhpnCjYpg76ipaNNhABYPGQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">No, no noticeable difference. Still very high,

        possibly higher than before. </div>

    </blockquote>

    <br>

    I was guessing that the cpu usage could be because of the diff

    algorithm which computes  checksums (which is a cpu intensive task).

    That doesn't  seem to be the case. Could you do a volume profile and

    see the FOPS that are happening on the bricks and share the result?<br>

    1.gluster volume profile &lt;volname&gt; start<br>

    2. gluster volume profile &lt;volname&gt; info<br>

    3. wait 10-15 seconds<br>

    4.gluster volume profile &lt;volname&gt; info<br>

    <br>

    <br>

    <br>

    <blockquote

cite="mid:CAFLz+Bk7Ot6w7-BRm85o7rrczyUDhpnCjYpg76ipaNNhABYPGQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">The system has come down to a crawl. It's difficult

        to even ssh or run any commands on the terminal. Do you make

        anything of the logs ? The brick log is just a giant alternating

        stream of those two lines I mentioned earlier. <br>

      </div>

    </blockquote>

    <br>

    <br>

    <blockquote

cite="mid:CAFLz+Bk7Ot6w7-BRm85o7rrczyUDhpnCjYpg76ipaNNhABYPGQ@mail.gmail.com"

      type="cite">

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Thu, Aug 6, 2015 at 10:10 PM,

          Ravishankar N <span dir="ltr">&lt;<a moz-do-not-send="true"

              href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;</span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex"><span

              class=""><br>

              <br>

              On 08/07/2015 01:33 AM, Prasun Gera wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                I replaced the brick in a node in my 3x2 dist+repl

                volume (RHS 3). I'm seeing that the heal process, which

                should essentially be a dump from the working replica to

                the newly added one is taking exceptionally long. It has

                moved ~100 G over a day on a 1Gigabit network. The CPU

                usage on both the nodes of the replica has been pretty

                high. <br>

              </blockquote>

              <br>

            </span>

            Does setting `cluster.data-self-heal-algorithm` to full make

            a difference in the cpu usage?

            <div class="HOEnZb">

              <div class="h5"><br>

                <br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  I also think that nagios is making it worse. The heal

                  is slow enough as it is, and nagios keeps triggering

                  heal info, which I think never completes. I also see

                  my logs filling up These are some of the log contents

                  which I got by running tail on them:<br>

                </blockquote>

                <br>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </body>

</html>