<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 08/07/2015 12:11 PM, Prasun Gera
wrote:<br>
</div>
<blockquote
cite="mid:CAFLz+Bk7Ot6w7-BRm85o7rrczyUDhpnCjYpg76ipaNNhABYPGQ@mail.gmail.com"
type="cite">
<div dir="ltr">No, no noticeable difference. Still very high,
possibly higher than before. </div>
</blockquote>
<br>
I was guessing that the cpu usage could be because of the diff
algorithm which computesĀ checksums (which is a cpu intensive task).
That doesn'tĀ seem to be the case. Could you do a volume profile and
see the FOPS that are happening on the bricks and share the result?<br>
1.gluster volume profile <volname> start<br>
2. gluster volume profile <volname> info<br>
3. wait 10-15 seconds<br>
4.gluster volume profile <volname> info<br>
<br>
<br>
<br>
<blockquote
cite="mid:CAFLz+Bk7Ot6w7-BRm85o7rrczyUDhpnCjYpg76ipaNNhABYPGQ@mail.gmail.com"
type="cite">
<div dir="ltr">The system has come down to a crawl. It's difficult
to even ssh or run any commands on the terminal. Do you make
anything of the logs ? The brick log is just a giant alternating
stream of those two lines I mentioned earlier. <br>
</div>
</blockquote>
<br>
<br>
<blockquote
cite="mid:CAFLz+Bk7Ot6w7-BRm85o7rrczyUDhpnCjYpg76ipaNNhABYPGQ@mail.gmail.com"
type="cite">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Aug 6, 2015 at 10:10 PM,
Ravishankar N <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class=""><br>
<br>
On 08/07/2015 01:33 AM, Prasun Gera wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
I replaced the brick in a node in my 3x2 dist+repl
volume (RHS 3). I'm seeing that the heal process, which
should essentially be a dump from the working replica to
the newly added one is taking exceptionally long. It has
moved ~100 G over a day on a 1Gigabit network. The CPU
usage on both the nodes of the replica has been pretty
high. <br>
</blockquote>
<br>
</span>
Does setting `cluster.data-self-heal-algorithm` to full make
a difference in the cpu usage?
<div class="HOEnZb">
<div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
I also think that nagios is making it worse. The heal
is slow enough as it is, and nagios keeps triggering
heal info, which I think never completes. I also see
my logs filling up These are some of the log contents
which I got by running tail on them:<br>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>