<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<br>
<div class="moz-cite-prefix">On 01/22/2016 07:25 AM, Glomski,
Patrick wrote:<br>
</div>
<blockquote
cite="mid:CALkMjdDRCsoOgmdefzBkdDez1Uqt9Z4_8qiEfCXLW-oasNz5gQ@mail.gmail.com"
type="cite">
<div dir="ltr">Unfortunately, all samba mounts to the gluster
volume through the gfapi vfs plugin have been disabled for the
last 6 hours or so and frequency of %cpu spikes is increased. We
had switched to sharing a fuse mount through samba, but I just
disabled that as well. There are no samba shares of this volume
now. The spikes now happen every thirty minutes or so. We've
resorted to just rebooting the machine with high load for the
present.<br>
</div>
</blockquote>
<br>
Next time this CPU spike happens, could you collect samples of
gstack <pid-of-brick> every second for 10-20 seconds? That
helps in finding the heavily hit function calls.<br>
<br>
Something like "for i in {1..20}; do gstack <pid-of-brick>
> sample-$i.txt; done"<br>
<br>
Pranith<br>
<blockquote
cite="mid:CALkMjdDRCsoOgmdefzBkdDez1Uqt9Z4_8qiEfCXLW-oasNz5gQ@mail.gmail.com"
type="cite">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jan 21, 2016 at 8:49 PM,
Pranith Kumar Karampuri <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com"
target="_blank">pkarampu@redhat.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span class=""> <br>
<br>
<div>On 01/22/2016 07:13 AM, Glomski, Patrick wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">We use the samba glusterfs virtual
filesystem (the current version provided on <a
moz-do-not-send="true"
href="http://download.gluster.org" target="_blank">download.gluster.org</a>),
but no windows clients connecting directly.<br>
</div>
</blockquote>
<br>
</span> Hmm.. Is there a way to disable using this and
check if the CPU% still increases? What getxattr of
"glusterfs.get_real_filename <filanme>" does is to
scan the entire directory looking for
strcasecmp(<filname>, <scanned-filename>). If
anything matches then it will return the
<scanned-filename>. But the problem is the scan is
costly. So I wonder if this is the reason for the CPU
spikes.<span class="HOEnZb"><font color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div class="h5"><br>
<blockquote type="cite">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jan 21, 2016 at
8:37 PM, Pranith Kumar Karampuri <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com"
target="_blank">pkarampu@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Do you
have any windows clients? I see a lot of
getxattr calls for
"glusterfs.get_real_filename" which lead to
full readdirs of the directories on the
brick.<span><font color="#888888"><br>
<br>
Pranith</font></span><span><br>
<br>
<div>On 01/22/2016 12:51 AM, Glomski,
Patrick wrote:<br>
</div>
</span>
<div>
<div>
<blockquote type="cite">
<div dir="ltr">
<div>Pranith, could this kind of
behavior be self-inflicted by us
deleting files directly from the
bricks? We have done that in the
past to clean up an issues where
gluster wouldn't allow us to
delete from the mount.<br>
<br>
If so, is it feasible to clean
them up by running a search on the
.glusterfs directories directly
and removing files with a
reference count of 1 that are
non-zero size (or directly
checking the xattrs to be sure
that it's not a DHT link). <br>
<br>
find
/data/brick01a/homegfs/.glusterfs
-type f -not -empty -links -2
-exec rm -f "{}" \;<br>
<br>
</div>
Is there anything I'm inherently
missing with that approach that will
further corrupt the system?<br>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jan
21, 2016 at 1:02 PM, Glomski,
Patrick <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:patrick.glomski@corvidtec.com"
target="_blank">patrick.glomski@corvidtec.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>Load spiked again:
~1200%cpu on gfs02a for
glusterfsd. Crawl has been
running on one of the
bricks on gfs02b for 25
min or so and users cannot
access the volume.<br>
<br>
I re-listed the xattrop
directories as well as a
'top' entry and heal
statistics. Then I
restarted the gluster
services on gfs02a. <br>
<br>
=================== top
===================<br>
PID USER PR NI
VIRT RES SHR S %CPU
%MEM TIME+
COMMAND
<br>
8969 root 20 0
2815m 204m 3588 S 1181.0
0.6 591:06.93
glusterfsd <br>
<br>
===================
xattrop
===================<br>
/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-41f19453-91e4-437c-afa9-3b25614de210
xattrop-9b815879-2f4d-402b-867c-a6d65087788c<br>
<br>
/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-70131855-3cfb-49af-abce-9d23f57fb393
xattrop-dfb77848-a39d-4417-a725-9beca75d78c6<br>
<br>
/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>
e6e47ed9-309b-42a7-8c44-28c29b9a20f8
xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125<br>
xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934
xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0<br>
<br>
/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc
xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413<br>
<br>
/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531<br>
<br>
/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-7e20fdb1-5224-4b9a-be06-568708526d70<br>
<br>
/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>
8034bc06-92cd-4fa5-8aaf-09039e79d2c8
c9ce22ed-6d8b-471b-a111-b39e57f0b512<br>
94fa1d60-45ad-4341-b69c-315936b51e8d
xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7<br>
<br>
/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d<br>
<br>
<br>
=================== heal
stats ===================<br>
<br>
homegfs [b0-gfsib01a] :
Starting time of
crawl : Thu Jan 21
12:36:45 2016<br>
homegfs [b0-gfsib01a] :
Ending time of
crawl : Thu Jan 21
12:36:45 2016<br>
homegfs [b0-gfsib01a] :
Type of crawl: INDEX<br>
homegfs [b0-gfsib01a] :
No. of entries
healed : 0<br>
homegfs [b0-gfsib01a] :
No. of entries in
split-brain: 0<br>
homegfs [b0-gfsib01a] :
No. of heal failed
entries : 0<br>
<br>
homegfs [b1-gfsib01b] :
Starting time of
crawl : Thu Jan 21
12:36:19 2016<br>
homegfs [b1-gfsib01b] :
Ending time of
crawl : Thu Jan 21
12:36:19 2016<br>
homegfs [b1-gfsib01b] :
Type of crawl: INDEX<br>
homegfs [b1-gfsib01b] :
No. of entries
healed : 0<br>
homegfs [b1-gfsib01b] :
No. of entries in
split-brain: 0<br>
homegfs [b1-gfsib01b] :
No. of heal failed
entries : 1<br>
<br>
homegfs [b2-gfsib01a] :
Starting time of
crawl : Thu Jan 21
12:36:48 2016<br>
homegfs [b2-gfsib01a] :
Ending time of
crawl : Thu Jan 21
12:36:48 2016<br>
homegfs [b2-gfsib01a] :
Type of crawl: INDEX<br>
homegfs [b2-gfsib01a] :
No. of entries
healed : 0<br>
homegfs [b2-gfsib01a] :
No. of entries in
split-brain: 0<br>
homegfs [b2-gfsib01a] :
No. of heal failed
entries : 0<br>
<br>
homegfs [b3-gfsib01b] :
Starting time of
crawl : Thu Jan 21
12:36:47 2016<br>
homegfs [b3-gfsib01b] :
Ending time of
crawl : Thu Jan 21
12:36:47 2016<br>
homegfs [b3-gfsib01b] :
Type of crawl: INDEX<br>
homegfs [b3-gfsib01b] :
No. of entries
healed : 0<br>
homegfs [b3-gfsib01b] :
No. of entries in
split-brain: 0<br>
homegfs [b3-gfsib01b] :
No. of heal failed
entries : 0<br>
<br>
homegfs [b4-gfsib02a] :
Starting time of
crawl : Thu Jan 21
12:36:06 2016<br>
homegfs [b4-gfsib02a] :
Ending time of
crawl : Thu Jan 21
12:36:06 2016<br>
homegfs [b4-gfsib02a] :
Type of crawl: INDEX<br>
homegfs [b4-gfsib02a] :
No. of entries
healed : 0<br>
homegfs [b4-gfsib02a] :
No. of entries in
split-brain: 0<br>
homegfs [b4-gfsib02a] :
No. of heal failed
entries : 0<br>
<br>
homegfs [b5-gfsib02b] :
Starting time of
crawl : Thu Jan 21
12:13:40 2016<br>
homegfs [b5-gfsib02b]
:
*** Crawl is in progress
***<br>
homegfs [b5-gfsib02b] :
Type of crawl: INDEX<br>
homegfs [b5-gfsib02b] :
No. of entries
healed : 0<br>
homegfs [b5-gfsib02b] :
No. of entries in
split-brain: 0<br>
homegfs [b5-gfsib02b] :
No. of heal failed
entries : 0<br>
<br>
homegfs [b6-gfsib02a] :
Starting time of
crawl : Thu Jan 21
12:36:58 2016<br>
homegfs [b6-gfsib02a] :
Ending time of
crawl : Thu Jan 21
12:36:58 2016<br>
homegfs [b6-gfsib02a] :
Type of crawl: INDEX<br>
homegfs [b6-gfsib02a] :
No. of entries
healed : 0<br>
homegfs [b6-gfsib02a] :
No. of entries in
split-brain: 0<br>
homegfs [b6-gfsib02a] :
No. of heal failed
entries : 0<br>
<br>
homegfs [b7-gfsib02b] :
Starting time of
crawl : Thu Jan 21
12:36:50 2016<br>
homegfs [b7-gfsib02b] :
Ending time of
crawl : Thu Jan 21
12:36:50 2016<br>
homegfs [b7-gfsib02b] :
Type of crawl: INDEX<br>
homegfs [b7-gfsib02b] :
No. of entries
healed : 0<br>
homegfs [b7-gfsib02b] :
No. of entries in
split-brain: 0<br>
homegfs [b7-gfsib02b] :
No. of heal failed
entries : 0<br>
<br>
<br>
========================================================================================<br>
</div>
I waited a few minutes for
the heals to finish and ran
the heal statistics and info
again. one file is in
split-brain. Aside from the
split-brain, the load on all
systems is down now and they
are behaving normally.
glustershd.log is attached.
What is going on??? <br>
<br>
Thu Jan 21 12:53:50 EST 2016<br>
<br>
=================== homegfs
===================<br>
<br>
homegfs [b0-gfsib01a] :
Starting time of crawl
: Thu Jan 21 12:53:02 2016<br>
homegfs [b0-gfsib01a] :
Ending time of crawl
: Thu Jan 21 12:53:02 2016<br>
homegfs [b0-gfsib01a] : Type
of crawl: INDEX<br>
homegfs [b0-gfsib01a] : No.
of entries healed : 0<br>
homegfs [b0-gfsib01a] : No.
of entries in split-brain: 0<br>
homegfs [b0-gfsib01a] : No.
of heal failed entries : 0<br>
<br>
homegfs [b1-gfsib01b] :
Starting time of crawl
: Thu Jan 21 12:53:38 2016<br>
homegfs [b1-gfsib01b] :
Ending time of crawl
: Thu Jan 21 12:53:38 2016<br>
homegfs [b1-gfsib01b] : Type
of crawl: INDEX<br>
homegfs [b1-gfsib01b] : No.
of entries healed : 0<br>
homegfs [b1-gfsib01b] : No.
of entries in split-brain: 0<br>
homegfs [b1-gfsib01b] : No.
of heal failed entries : 1<br>
<br>
homegfs [b2-gfsib01a] :
Starting time of crawl
: Thu Jan 21 12:53:04 2016<br>
homegfs [b2-gfsib01a] :
Ending time of crawl
: Thu Jan 21 12:53:04 2016<br>
homegfs [b2-gfsib01a] : Type
of crawl: INDEX<br>
homegfs [b2-gfsib01a] : No.
of entries healed : 0<br>
homegfs [b2-gfsib01a] : No.
of entries in split-brain: 0<br>
homegfs [b2-gfsib01a] : No.
of heal failed entries : 0<br>
<br>
homegfs [b3-gfsib01b] :
Starting time of crawl
: Thu Jan 21 12:53:04 2016<br>
homegfs [b3-gfsib01b] :
Ending time of crawl
: Thu Jan 21 12:53:04 2016<br>
homegfs [b3-gfsib01b] : Type
of crawl: INDEX<br>
homegfs [b3-gfsib01b] : No.
of entries healed : 0<br>
homegfs [b3-gfsib01b] : No.
of entries in split-brain: 0<br>
homegfs [b3-gfsib01b] : No.
of heal failed entries : 0<br>
<br>
homegfs [b4-gfsib02a] :
Starting time of crawl
: Thu Jan 21 12:53:33 2016<br>
homegfs [b4-gfsib02a] :
Ending time of crawl
: Thu Jan 21 12:53:33 2016<br>
homegfs [b4-gfsib02a] : Type
of crawl: INDEX<br>
homegfs [b4-gfsib02a] : No.
of entries healed : 0<br>
homegfs [b4-gfsib02a] : No.
of entries in split-brain: 0<br>
homegfs [b4-gfsib02a] : No.
of heal failed entries : 1<br>
<br>
homegfs [b5-gfsib02b] :
Starting time of crawl
: Thu Jan 21 12:53:14 2016<br>
homegfs [b5-gfsib02b] :
Ending time of crawl
: Thu Jan 21 12:53:15 2016<br>
homegfs [b5-gfsib02b] : Type
of crawl: INDEX<br>
homegfs [b5-gfsib02b] : No.
of entries healed : 0<br>
homegfs [b5-gfsib02b] : No.
of entries in split-brain: 0<br>
homegfs [b5-gfsib02b] : No.
of heal failed entries : 3<br>
<br>
homegfs [b6-gfsib02a] :
Starting time of crawl
: Thu Jan 21 12:53:04 2016<br>
homegfs [b6-gfsib02a] :
Ending time of crawl
: Thu Jan 21 12:53:04 2016<br>
homegfs [b6-gfsib02a] : Type
of crawl: INDEX<br>
homegfs [b6-gfsib02a] : No.
of entries healed : 0<br>
homegfs [b6-gfsib02a] : No.
of entries in split-brain: 0<br>
homegfs [b6-gfsib02a] : No.
of heal failed entries : 0<br>
<br>
homegfs [b7-gfsib02b] :
Starting time of crawl
: Thu Jan 21 12:53:09 2016<br>
homegfs [b7-gfsib02b] :
Ending time of crawl
: Thu Jan 21 12:53:09 2016<br>
homegfs [b7-gfsib02b] : Type
of crawl: INDEX<br>
homegfs [b7-gfsib02b] : No.
of entries healed : 0<br>
homegfs [b7-gfsib02b] : No.
of entries in split-brain: 0<br>
homegfs [b7-gfsib02b] : No.
of heal failed entries : 0<br>
<br>
*** gluster bug in 'gluster
volume heal homegfs
statistics' ***<br>
*** Use 'gluster volume heal
homegfs info' until bug is
fixed ***<span><br>
<br>
Brick
gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>
</span>/users/bangell/.gconfd
- Is in split-brain<br>
<br>
Number of entries: 1<br>
<br>
Brick
gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>
/users/bangell/.gconfd - Is
in split-brain<br>
<br>
/users/bangell/.gconfd/saved_state
<br>
Number of entries: 2<span><br>
<br>
Brick
gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>
Number of entries: 0<br>
<br>
</span></div>
<div><br>
<br>
</div>
</div>
<div>
<div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On
Thu, Jan 21, 2016 at
11:10 AM, Pranith Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF"
text="#000000"><span>
<br>
<br>
<div>On 01/21/2016
09:26 PM,
Glomski, Patrick
wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">
<div>I should
mention that
the problem is
not currently
occurring and
there are no
heals (output
appended). By
restarting the
gluster
services, we
can stop the
crawl, which
lowers the
load for a
while.
Subsequent
crawls seem to
finish
properly. For
what it's
worth,
files/folders
that show up
in the 'volume
info' output
during a hung
crawl don't
seem to be
anything out
of the
ordinary. <br>
<br>
Over the past
four days, the
typical time
before the
problem recurs
after
suppressing it
in this manner
is an hour.
Last night
when we
reached out to
you was the
last time it
happened and
the load has
been low since
(a relief).
David believes
that
recursively
listing the
files (ls -alR
or similar)
from a client
mount can
force the
issue to
happen, but
obviously I'd
rather not
unless we have
some precise
thing we're
looking for.
Let me know if
you'd like me
to attempt to
drive the
system
unstable like
that and what
I should look
for. As it's a
production
system, I'd
rather not
leave it in
this state for
long.<br>
</div>
</div>
</blockquote>
<br>
</span> Will it be
possible to send
glustershd, mount
logs of the past 4
days? I would like
to see if this is
because of directory
self-heal going wild
(Ravi is working on
throttling feature
for 3.8, which will
allow to put breaks
on self-heal
traffic)<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>[root@gfs01a
xattrop]#
gluster volume
heal homegfs
info<br>
Brick
gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>
Number of
entries: 0<br>
<br>
Brick
gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>
Number of
entries: 0<br>
<br>
Brick
gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>
Number of
entries: 0<br>
<br>
Brick
gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>
Number of
entries: 0<br>
<br>
Brick
gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>
Number of
entries: 0<br>
<br>
Brick
gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>
Number of
entries: 0<br>
<br>
Brick
gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>
Number of
entries: 0<br>
<br>
Brick
gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>
Number of
entries: 0<br>
<br>
<br>
<br>
</div>
</div>
<div
class="gmail_extra"><br>
<div
class="gmail_quote">On
Thu, Jan 21,
2016 at 10:40
AM, Pranith
Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
bgcolor="#FFFFFF"
text="#000000"><span>
<br>
<br>
<div>On
01/21/2016
08:25 PM,
Glomski,
Patrick wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">
<div>Hello,
Pranith. The
typical
behavior is
that the %cpu
on a
glusterfsd
process jumps
to number of
processor
cores
available
(800% or
1200%,
depending on
the pair of
nodes
involved) and
the load
average on the
machine goes
very high
(~20). The
volume's heal
statistics
output shows
that it is
crawling one
of the bricks
and trying to
heal, but this
crawl hangs
and never
seems to
finish.<br>
</div>
</div>
</blockquote>
<blockquote
type="cite">
<div dir="ltr">
<div><br>
</div>
The number of
files in the
xattrop
directory
varies over
time, so I ran
a wc -l as you
requested
periodically
for some time
and then
started
including a
datestamped
list of the
files that
were in the
xattrops
directory on
each brick to
see which were
persistent.
All bricks had
files in the
xattrop
folder, so all
results are
attached.<br>
</div>
</blockquote>
</span> Thanks
this info is
helpful. I
don't see a
lot of files.
Could you give
output of
"gluster
volume heal
<volname>
info"? Is
there any
directory in
there which is
LARGE?<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Please
let me know if
there is
anything else
I can provide.<br>
</div>
<div><br>
</div>
<div>Patrick<br>
</div>
<div><br>
</div>
</div>
<div
class="gmail_extra"><br>
<div
class="gmail_quote">On
Thu, Jan 21,
2016 at 12:01
AM, Pranith
Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
bgcolor="#FFFFFF"
text="#000000">
hey,<br>
Which
process is
consuming so
much cpu? I
went through
the logs you
gave me. I see
that the
following
files are in
gfid mismatch
state:<br>
<br>
<066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>,<br>
<1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>,<br>
<ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg>,<br>
<br>
Could you give
me the output
of "ls
<brick-path>/indices/xattrop
| wc -l"
output on all
the bricks
which are
acting this
way? This will
tell us the
number of
pending
self-heals on
the system.<br>
<br>
Pranith
<div>
<div><br>
<br>
<div>On
01/20/2016
09:26 PM,
David Robinson
wrote:<br>
</div>
</div>
</div>
<blockquote
type="cite">
<div>
<div>
<div>resending
with parsed
logs... </div>
<div> </div>
<div>
<blockquote
cite="http://em5ee26b0e-002a-4230-bdec-3020b98cff3c@dfrobins-vaio"
type="cite">
<div> </div>
<div> </div>
<div>
<blockquote
cite="http://eme3b2cb80-8be2-4fa5-9d08-4710955e237c@dfrobins-vaio"
type="cite">
<div>I am
having issues
with 3.6.6
where the load
will spike up
to 800% for
one of the
glusterfsd
processes and
the users can
no longer
access the
system. If I
reboot the
node, the heal
will finish
normally after
a few minutes
and the system
will be
responsive,
but a few
hours later
the issue will
start again.
It look like
it is hanging
in a heal and
spinning up
the load on
one of the
bricks. The
heal gets
stuck and says
it is crawling
and never
returns.
After a few
minutes of the
heal saying it
is crawling,
the load
spikes up and
the mounts
become
unresponsive.</div>
<div> </div>
<div>Any
suggestions on
how to fix
this? It has
us stopped
cold as the
user can no
longer access
the systems
when the load
spikes... Logs
attached.</div>
<div> </div>
<div>System
setup info is:
</div>
<div> </div>
<div>[root@gfs01a
~]# gluster
volume info
homegfs<br>
<br>
Volume Name:
homegfs<br>
Type:
Distributed-Replicate<br>
Volume ID:
1e32672a-f1b7-4b58-ba94-58c085e59071<br>
Status:
Started<br>
Number of
Bricks: 4 x 2
= 8<br>
Transport-type:
tcp<br>
Bricks:<br>
Brick1:
gfsib01a.corvidtec.com:/data/brick01a/homegfs<br>
Brick2:
gfsib01b.corvidtec.com:/data/brick01b/homegfs<br>
Brick3:
gfsib01a.corvidtec.com:/data/brick02a/homegfs<br>
Brick4:
gfsib01b.corvidtec.com:/data/brick02b/homegfs<br>
Brick5:
gfsib02a.corvidtec.com:/data/brick01a/homegfs<br>
Brick6:
gfsib02b.corvidtec.com:/data/brick01b/homegfs<br>
Brick7:
gfsib02a.corvidtec.com:/data/brick02a/homegfs<br>
Brick8:
gfsib02b.corvidtec.com:/data/brick02b/homegfs<br>
Options
Reconfigured:<br>
performance.io-thread-count:
32<br>
performance.cache-size:
128MB<br>
performance.write-behind-window-size:
128MB<br>
server.allow-insecure:
on<br>
network.ping-timeout:
42<br>
storage.owner-gid:
100<br>
geo-replication.indexing:
off<br>
geo-replication.ignore-pid-check:
on<br>
changelog.changelog:
off<br>
changelog.fsync-interval:
3<br>
changelog.rollover-time:
15<br>
server.manage-gids:
on<br>
diagnostics.client-log-level:
WARNING</div>
<div> </div>
<div>[root@gfs01a
~]# rpm -qa |
grep gluster<br>
gluster-nagios-common-0.1.1-0.el6.noarch<br>
glusterfs-fuse-3.6.6-1.el6.x86_64<br>
glusterfs-debuginfo-3.6.6-1.el6.x86_64<br>
glusterfs-libs-3.6.6-1.el6.x86_64<br>
glusterfs-geo-replication-3.6.6-1.el6.x86_64<br>
glusterfs-api-3.6.6-1.el6.x86_64<br>
glusterfs-devel-3.6.6-1.el6.x86_64<br>
glusterfs-api-devel-3.6.6-1.el6.x86_64<br>
glusterfs-3.6.6-1.el6.x86_64<br>
glusterfs-cli-3.6.6-1.el6.x86_64<br>
glusterfs-rdma-3.6.6-1.el6.x86_64<br>
samba-vfs-glusterfs-4.1.11-2.el6.x86_64<br>
glusterfs-server-3.6.6-1.el6.x86_64<br>
glusterfs-extra-xlators-3.6.6-1.el6.x86_64<br>
</div>
<div> </div>
<div>
<div
style="FONT-SIZE:12pt;FONT-FAMILY:Times
New Roman"><span><span>
<div> </div>
</span></span></div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<pre>_______________________________________________
Gluster-devel mailing list
<a moz-do-not-send="true" href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a>
<a moz-do-not-send="true" href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>
</blockquote>
<br>
</div>
<br>
_______________________________________________<br>
Gluster-users
mailing list<br>
<a
moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a
moz-do-not-send="true"
href="http://www.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer"
target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>