<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">On 08/07/2015 12:11 PM, Prasun Gera
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAFLz+Bk7Ot6w7-BRm85o7rrczyUDhpnCjYpg76ipaNNhABYPGQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">No, no noticeable difference. Still very high,
        possibly higher than before. </div>
    </blockquote>
    <br>
    I was guessing that the cpu usage could be because of the diff
    algorithm which computesĀ  checksums (which is a cpu intensive task).
    That doesn'tĀ  seem to be the case. Could you do a volume profile and
    see the FOPS that are happening on the bricks and share the result?<br>
    1.gluster volume profile &lt;volname&gt; start<br>
    2. gluster volume profile &lt;volname&gt; info<br>
    3. wait 10-15 seconds<br>
    4.gluster volume profile &lt;volname&gt; info<br>
    <br>
    <br>
    <br>
    <blockquote
cite="mid:CAFLz+Bk7Ot6w7-BRm85o7rrczyUDhpnCjYpg76ipaNNhABYPGQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">The system has come down to a crawl. It's difficult
        to even ssh or run any commands on the terminal. Do you make
        anything of the logs ? The brick log is just a giant alternating
        stream of those two lines I mentioned earlier. <br>
      </div>
    </blockquote>
    <br>
    <br>
    <blockquote
cite="mid:CAFLz+Bk7Ot6w7-BRm85o7rrczyUDhpnCjYpg76ipaNNhABYPGQ@mail.gmail.com"
      type="cite">
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Thu, Aug 6, 2015 at 10:10 PM,
          Ravishankar N <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
              class=""><br>
              <br>
              On 08/07/2015 01:33 AM, Prasun Gera wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                I replaced the brick in a node in my 3x2 dist+repl
                volume (RHS 3). I'm seeing that the heal process, which
                should essentially be a dump from the working replica to
                the newly added one is taking exceptionally long. It has
                moved ~100 G over a day on a 1Gigabit network. The CPU
                usage on both the nodes of the replica has been pretty
                high. <br>
              </blockquote>
              <br>
            </span>
            Does setting `cluster.data-self-heal-algorithm` to full make
            a difference in the cpu usage?
            <div class="HOEnZb">
              <div class="h5"><br>
                <br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  I also think that nagios is making it worse. The heal
                  is slow enough as it is, and nagios keeps triggering
                  heal info, which I think never completes. I also see
                  my logs filling up These are some of the log contents
                  which I got by running tail on them:<br>
                </blockquote>
                <br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>