<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<br>
<div class="moz-cite-prefix">On 01/22/2016 07:19 AM, Pranith Kumar
Karampuri wrote:<br>
</div>
<blockquote cite="mid:56A18AA6.8010701@redhat.com" type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<br>
<br>
<div class="moz-cite-prefix">On 01/22/2016 07:13 AM, Glomski,
Patrick wrote:<br>
</div>
<blockquote
cite="mid:CALkMjdDxd0zCGM4tn9PTXGEgUR+Z7cF0vhbd+d4TCJkun2tEfg@mail.gmail.com"
type="cite">
<div dir="ltr">We use the samba glusterfs virtual filesystem
(the current version provided on <a moz-do-not-send="true"
href="http://download.gluster.org">download.gluster.org</a>),
but no windows clients connecting directly.<br>
</div>
</blockquote>
<br>
Hmm.. Is there a way to disable using this and check if the CPU%
still increases? What getxattr of "glusterfs.get_real_filename
<filanme>" does is to scan the entire directory looking for
strcasecmp(<filname>, <scanned-filename>). If anything
matches then it will return the <scanned-filename>. But the
problem is the scan is costly. So I wonder if this is the reason
for the CPU spikes.<br>
</blockquote>
+Raghavendra Talur, +Poornima<br>
<br>
Raghavendra, Poornima,<br>
When are these getxattrs triggered? Did you guys see any
brick CPU spikes before? I initially thought it could be because of
big directory heals. But this is happening even when no self-heals
are required. So I had to move away from that theory.<br>
<br>
Pranith<br>
<blockquote cite="mid:56A18AA6.8010701@redhat.com" type="cite"> <br>
Pranith<br>
<blockquote
cite="mid:CALkMjdDxd0zCGM4tn9PTXGEgUR+Z7cF0vhbd+d4TCJkun2tEfg@mail.gmail.com"
type="cite">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jan 21, 2016 at 8:37 PM,
Pranith Kumar Karampuri <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com"
target="_blank">pkarampu@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Do you have any
windows clients? I see a lot of getxattr calls for
"glusterfs.get_real_filename" which lead to full
readdirs of the directories on the brick.<span
class="HOEnZb"><font color="#888888"><br>
<br>
Pranith</font></span><span class=""><br>
<br>
<div>On 01/22/2016 12:51 AM, Glomski, Patrick wrote:<br>
</div>
</span>
<div>
<div class="h5">
<blockquote type="cite">
<div dir="ltr">
<div>Pranith, could this kind of behavior be
self-inflicted by us deleting files directly
from the bricks? We have done that in the past
to clean up an issues where gluster wouldn't
allow us to delete from the mount.<br>
<br>
If so, is it feasible to clean them up by
running a search on the .glusterfs directories
directly and removing files with a reference
count of 1 that are non-zero size (or directly
checking the xattrs to be sure that it's not a
DHT link). <br>
<br>
find /data/brick01a/homegfs/.glusterfs -type f
-not -empty -links -2 -exec rm -f "{}" \;<br>
<br>
</div>
Is there anything I'm inherently missing with
that approach that will further corrupt the
system?<br>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jan 21, 2016 at
1:02 PM, Glomski, Patrick <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:patrick.glomski@corvidtec.com"
target="_blank">patrick.glomski@corvidtec.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>Load spiked again: ~1200%cpu on
gfs02a for glusterfsd. Crawl has been
running on one of the bricks on gfs02b
for 25 min or so and users cannot
access the volume.<br>
<br>
I re-listed the xattrop directories as
well as a 'top' entry and heal
statistics. Then I restarted the
gluster services on gfs02a. <br>
<br>
=================== top
===================<br>
PID USER PR NI VIRT RES SHR S
%CPU %MEM TIME+
COMMAND
<br>
8969 root 20 0 2815m 204m 3588
S 1181.0 0.6 591:06.93
glusterfsd <br>
<br>
=================== xattrop
===================<br>
/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-41f19453-91e4-437c-afa9-3b25614de210
xattrop-9b815879-2f4d-402b-867c-a6d65087788c<br>
<br>
/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-70131855-3cfb-49af-abce-9d23f57fb393
xattrop-dfb77848-a39d-4417-a725-9beca75d78c6<br>
<br>
/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>
e6e47ed9-309b-42a7-8c44-28c29b9a20f8
xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125<br>
xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934
xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0<br>
<br>
/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc
xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413<br>
<br>
/data/brick01a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531<br>
<br>
/data/brick02a/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-7e20fdb1-5224-4b9a-be06-568708526d70<br>
<br>
/data/brick01b/homegfs/.glusterfs/indices/xattrop:<br>
8034bc06-92cd-4fa5-8aaf-09039e79d2c8
c9ce22ed-6d8b-471b-a111-b39e57f0b512<br>
94fa1d60-45ad-4341-b69c-315936b51e8d
xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7<br>
<br>
/data/brick02b/homegfs/.glusterfs/indices/xattrop:<br>
xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d<br>
<br>
<br>
=================== heal stats
===================<br>
<br>
homegfs [b0-gfsib01a] : Starting time
of crawl : Thu Jan 21 12:36:45
2016<br>
homegfs [b0-gfsib01a] : Ending time of
crawl : Thu Jan 21 12:36:45
2016<br>
homegfs [b0-gfsib01a] : Type of crawl:
INDEX<br>
homegfs [b0-gfsib01a] : No. of entries
healed : 0<br>
homegfs [b0-gfsib01a] : No. of entries
in split-brain: 0<br>
homegfs [b0-gfsib01a] : No. of heal
failed entries : 0<br>
<br>
homegfs [b1-gfsib01b] : Starting time
of crawl : Thu Jan 21 12:36:19
2016<br>
homegfs [b1-gfsib01b] : Ending time of
crawl : Thu Jan 21 12:36:19
2016<br>
homegfs [b1-gfsib01b] : Type of crawl:
INDEX<br>
homegfs [b1-gfsib01b] : No. of entries
healed : 0<br>
homegfs [b1-gfsib01b] : No. of entries
in split-brain: 0<br>
homegfs [b1-gfsib01b] : No. of heal
failed entries : 1<br>
<br>
homegfs [b2-gfsib01a] : Starting time
of crawl : Thu Jan 21 12:36:48
2016<br>
homegfs [b2-gfsib01a] : Ending time of
crawl : Thu Jan 21 12:36:48
2016<br>
homegfs [b2-gfsib01a] : Type of crawl:
INDEX<br>
homegfs [b2-gfsib01a] : No. of entries
healed : 0<br>
homegfs [b2-gfsib01a] : No. of entries
in split-brain: 0<br>
homegfs [b2-gfsib01a] : No. of heal
failed entries : 0<br>
<br>
homegfs [b3-gfsib01b] : Starting time
of crawl : Thu Jan 21 12:36:47
2016<br>
homegfs [b3-gfsib01b] : Ending time of
crawl : Thu Jan 21 12:36:47
2016<br>
homegfs [b3-gfsib01b] : Type of crawl:
INDEX<br>
homegfs [b3-gfsib01b] : No. of entries
healed : 0<br>
homegfs [b3-gfsib01b] : No. of entries
in split-brain: 0<br>
homegfs [b3-gfsib01b] : No. of heal
failed entries : 0<br>
<br>
homegfs [b4-gfsib02a] : Starting time
of crawl : Thu Jan 21 12:36:06
2016<br>
homegfs [b4-gfsib02a] : Ending time of
crawl : Thu Jan 21 12:36:06
2016<br>
homegfs [b4-gfsib02a] : Type of crawl:
INDEX<br>
homegfs [b4-gfsib02a] : No. of entries
healed : 0<br>
homegfs [b4-gfsib02a] : No. of entries
in split-brain: 0<br>
homegfs [b4-gfsib02a] : No. of heal
failed entries : 0<br>
<br>
homegfs [b5-gfsib02b] : Starting time
of crawl : Thu Jan 21 12:13:40
2016<br>
homegfs [b5-gfsib02b]
: ***
Crawl is in progress ***<br>
homegfs [b5-gfsib02b] : Type of crawl:
INDEX<br>
homegfs [b5-gfsib02b] : No. of entries
healed : 0<br>
homegfs [b5-gfsib02b] : No. of entries
in split-brain: 0<br>
homegfs [b5-gfsib02b] : No. of heal
failed entries : 0<br>
<br>
homegfs [b6-gfsib02a] : Starting time
of crawl : Thu Jan 21 12:36:58
2016<br>
homegfs [b6-gfsib02a] : Ending time of
crawl : Thu Jan 21 12:36:58
2016<br>
homegfs [b6-gfsib02a] : Type of crawl:
INDEX<br>
homegfs [b6-gfsib02a] : No. of entries
healed : 0<br>
homegfs [b6-gfsib02a] : No. of entries
in split-brain: 0<br>
homegfs [b6-gfsib02a] : No. of heal
failed entries : 0<br>
<br>
homegfs [b7-gfsib02b] : Starting time
of crawl : Thu Jan 21 12:36:50
2016<br>
homegfs [b7-gfsib02b] : Ending time of
crawl : Thu Jan 21 12:36:50
2016<br>
homegfs [b7-gfsib02b] : Type of crawl:
INDEX<br>
homegfs [b7-gfsib02b] : No. of entries
healed : 0<br>
homegfs [b7-gfsib02b] : No. of entries
in split-brain: 0<br>
homegfs [b7-gfsib02b] : No. of heal
failed entries : 0<br>
<br>
<br>
========================================================================================<br>
</div>
I waited a few minutes for the heals to
finish and ran the heal statistics and
info again. one file is in split-brain.
Aside from the split-brain, the load on
all systems is down now and they are
behaving normally. glustershd.log is
attached. What is going on??? <br>
<br>
Thu Jan 21 12:53:50 EST 2016<br>
<br>
=================== homegfs
===================<br>
<br>
homegfs [b0-gfsib01a] : Starting time of
crawl : Thu Jan 21 12:53:02 2016<br>
homegfs [b0-gfsib01a] : Ending time of
crawl : Thu Jan 21 12:53:02 2016<br>
homegfs [b0-gfsib01a] : Type of crawl:
INDEX<br>
homegfs [b0-gfsib01a] : No. of entries
healed : 0<br>
homegfs [b0-gfsib01a] : No. of entries
in split-brain: 0<br>
homegfs [b0-gfsib01a] : No. of heal
failed entries : 0<br>
<br>
homegfs [b1-gfsib01b] : Starting time of
crawl : Thu Jan 21 12:53:38 2016<br>
homegfs [b1-gfsib01b] : Ending time of
crawl : Thu Jan 21 12:53:38 2016<br>
homegfs [b1-gfsib01b] : Type of crawl:
INDEX<br>
homegfs [b1-gfsib01b] : No. of entries
healed : 0<br>
homegfs [b1-gfsib01b] : No. of entries
in split-brain: 0<br>
homegfs [b1-gfsib01b] : No. of heal
failed entries : 1<br>
<br>
homegfs [b2-gfsib01a] : Starting time of
crawl : Thu Jan 21 12:53:04 2016<br>
homegfs [b2-gfsib01a] : Ending time of
crawl : Thu Jan 21 12:53:04 2016<br>
homegfs [b2-gfsib01a] : Type of crawl:
INDEX<br>
homegfs [b2-gfsib01a] : No. of entries
healed : 0<br>
homegfs [b2-gfsib01a] : No. of entries
in split-brain: 0<br>
homegfs [b2-gfsib01a] : No. of heal
failed entries : 0<br>
<br>
homegfs [b3-gfsib01b] : Starting time of
crawl : Thu Jan 21 12:53:04 2016<br>
homegfs [b3-gfsib01b] : Ending time of
crawl : Thu Jan 21 12:53:04 2016<br>
homegfs [b3-gfsib01b] : Type of crawl:
INDEX<br>
homegfs [b3-gfsib01b] : No. of entries
healed : 0<br>
homegfs [b3-gfsib01b] : No. of entries
in split-brain: 0<br>
homegfs [b3-gfsib01b] : No. of heal
failed entries : 0<br>
<br>
homegfs [b4-gfsib02a] : Starting time of
crawl : Thu Jan 21 12:53:33 2016<br>
homegfs [b4-gfsib02a] : Ending time of
crawl : Thu Jan 21 12:53:33 2016<br>
homegfs [b4-gfsib02a] : Type of crawl:
INDEX<br>
homegfs [b4-gfsib02a] : No. of entries
healed : 0<br>
homegfs [b4-gfsib02a] : No. of entries
in split-brain: 0<br>
homegfs [b4-gfsib02a] : No. of heal
failed entries : 1<br>
<br>
homegfs [b5-gfsib02b] : Starting time of
crawl : Thu Jan 21 12:53:14 2016<br>
homegfs [b5-gfsib02b] : Ending time of
crawl : Thu Jan 21 12:53:15 2016<br>
homegfs [b5-gfsib02b] : Type of crawl:
INDEX<br>
homegfs [b5-gfsib02b] : No. of entries
healed : 0<br>
homegfs [b5-gfsib02b] : No. of entries
in split-brain: 0<br>
homegfs [b5-gfsib02b] : No. of heal
failed entries : 3<br>
<br>
homegfs [b6-gfsib02a] : Starting time of
crawl : Thu Jan 21 12:53:04 2016<br>
homegfs [b6-gfsib02a] : Ending time of
crawl : Thu Jan 21 12:53:04 2016<br>
homegfs [b6-gfsib02a] : Type of crawl:
INDEX<br>
homegfs [b6-gfsib02a] : No. of entries
healed : 0<br>
homegfs [b6-gfsib02a] : No. of entries
in split-brain: 0<br>
homegfs [b6-gfsib02a] : No. of heal
failed entries : 0<br>
<br>
homegfs [b7-gfsib02b] : Starting time of
crawl : Thu Jan 21 12:53:09 2016<br>
homegfs [b7-gfsib02b] : Ending time of
crawl : Thu Jan 21 12:53:09 2016<br>
homegfs [b7-gfsib02b] : Type of crawl:
INDEX<br>
homegfs [b7-gfsib02b] : No. of entries
healed : 0<br>
homegfs [b7-gfsib02b] : No. of entries
in split-brain: 0<br>
homegfs [b7-gfsib02b] : No. of heal
failed entries : 0<br>
<br>
*** gluster bug in 'gluster volume heal
homegfs statistics' ***<br>
*** Use 'gluster volume heal homegfs
info' until bug is fixed ***<span><br>
<br>
Brick
gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>
</span>/users/bangell/.gconfd - Is in
split-brain<br>
<br>
Number of entries: 1<br>
<br>
Brick
gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>
/users/bangell/.gconfd - Is in
split-brain<br>
<br>
/users/bangell/.gconfd/saved_state <br>
Number of entries: 2<span><br>
<br>
Brick
gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>
Number of entries: 0<br>
<br>
</span></div>
<div><br>
<br>
</div>
</div>
<div>
<div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jan
21, 2016 at 11:10 AM, Pranith Kumar
Karampuri <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:pkarampu@redhat.com"
target="_blank">pkarampu@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF"
text="#000000"><span> <br>
<br>
<div>On 01/21/2016 09:26 PM,
Glomski, Patrick wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>I should mention that
the problem is not
currently occurring and
there are no heals
(output appended). By
restarting the gluster
services, we can stop
the crawl, which lowers
the load for a while.
Subsequent crawls seem
to finish properly. For
what it's worth,
files/folders that show
up in the 'volume info'
output during a hung
crawl don't seem to be
anything out of the
ordinary. <br>
<br>
Over the past four days,
the typical time before
the problem recurs after
suppressing it in this
manner is an hour. Last
night when we reached
out to you was the last
time it happened and the
load has been low since
(a relief). David
believes that
recursively listing the
files (ls -alR or
similar) from a client
mount can force the
issue to happen, but
obviously I'd rather not
unless we have some
precise thing we're
looking for. Let me know
if you'd like me to
attempt to drive the
system unstable like
that and what I should
look for. As it's a
production system, I'd
rather not leave it in
this state for long.<br>
</div>
</div>
</blockquote>
<br>
</span> Will it be possible to
send glustershd, mount logs of
the past 4 days? I would like to
see if this is because of
directory self-heal going wild
(Ravi is working on throttling
feature for 3.8, which will
allow to put breaks on self-heal
traffic)<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>[root@gfs01a
xattrop]# gluster
volume heal homegfs
info<br>
Brick
gfs01a.corvidtec.com:/data/brick01a/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs01b.corvidtec.com:/data/brick01b/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs01a.corvidtec.com:/data/brick02a/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs01b.corvidtec.com:/data/brick02b/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs02a.corvidtec.com:/data/brick01a/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs02b.corvidtec.com:/data/brick01b/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs02a.corvidtec.com:/data/brick02a/homegfs/<br>
Number of entries: 0<br>
<br>
Brick
gfs02b.corvidtec.com:/data/brick02b/homegfs/<br>
Number of entries: 0<br>
<br>
<br>
<br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On
Thu, Jan 21, 2016 at
10:40 AM, Pranith
Kumar Karampuri <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
bgcolor="#FFFFFF"
text="#000000"><span>
<br>
<br>
<div>On
01/21/2016
08:25 PM,
Glomski,
Patrick wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">
<div>Hello,
Pranith. The
typical
behavior is
that the %cpu
on a
glusterfsd
process jumps
to number of
processor
cores
available
(800% or
1200%,
depending on
the pair of
nodes
involved) and
the load
average on the
machine goes
very high
(~20). The
volume's heal
statistics
output shows
that it is
crawling one
of the bricks
and trying to
heal, but this
crawl hangs
and never
seems to
finish.<br>
</div>
</div>
</blockquote>
<blockquote
type="cite">
<div dir="ltr">
<div><br>
</div>
The number of
files in the
xattrop
directory
varies over
time, so I ran
a wc -l as you
requested
periodically
for some time
and then
started
including a
datestamped
list of the
files that
were in the
xattrops
directory on
each brick to
see which were
persistent.
All bricks had
files in the
xattrop
folder, so all
results are
attached.<br>
</div>
</blockquote>
</span> Thanks
this info is
helpful. I don't
see a lot of
files. Could you
give output of
"gluster volume
heal
<volname>
info"? Is there
any directory in
there which is
LARGE?<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Please
let me know if
there is
anything else
I can provide.<br>
</div>
<div><br>
</div>
<div>Patrick<br>
</div>
<div><br>
</div>
</div>
<div
class="gmail_extra"><br>
<div
class="gmail_quote">On
Thu, Jan 21,
2016 at 12:01
AM, Pranith
Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
bgcolor="#FFFFFF"
text="#000000">
hey,<br>
Which
process is
consuming so
much cpu? I
went through
the logs you
gave me. I see
that the
following
files are in
gfid mismatch
state:<br>
<br>
<066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>,<br>
<1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>,<br>
<ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg>,<br>
<br>
Could you give
me the output
of "ls
<brick-path>/indices/xattrop
| wc -l"
output on all
the bricks
which are
acting this
way? This will
tell us the
number of
pending
self-heals on
the system.<br>
<br>
Pranith
<div>
<div><br>
<br>
<div>On
01/20/2016
09:26 PM,
David Robinson
wrote:<br>
</div>
</div>
</div>
<blockquote
type="cite">
<div>
<div>
<div>resending
with parsed
logs... </div>
<div> </div>
<div>
<blockquote
cite="http://em5ee26b0e-002a-4230-bdec-3020b98cff3c@dfrobins-vaio"
type="cite">
<div> </div>
<div> </div>
<div>
<blockquote
cite="http://eme3b2cb80-8be2-4fa5-9d08-4710955e237c@dfrobins-vaio"
type="cite">
<div>I am
having issues
with 3.6.6
where the load
will spike up
to 800% for
one of the
glusterfsd
processes and
the users can
no longer
access the
system. If I
reboot the
node, the heal
will finish
normally after
a few minutes
and the system
will be
responsive,
but a few
hours later
the issue will
start again.
It look like
it is hanging
in a heal and
spinning up
the load on
one of the
bricks. The
heal gets
stuck and says
it is crawling
and never
returns.
After a few
minutes of the
heal saying it
is crawling,
the load
spikes up and
the mounts
become
unresponsive.</div>
<div> </div>
<div>Any
suggestions on
how to fix
this? It has
us stopped
cold as the
user can no
longer access
the systems
when the load
spikes... Logs
attached.</div>
<div> </div>
<div>System
setup info is:
</div>
<div> </div>
<div>[root@gfs01a
~]# gluster
volume info
homegfs<br>
<br>
Volume Name:
homegfs<br>
Type:
Distributed-Replicate<br>
Volume ID:
1e32672a-f1b7-4b58-ba94-58c085e59071<br>
Status:
Started<br>
Number of
Bricks: 4 x 2
= 8<br>
Transport-type:
tcp<br>
Bricks:<br>
Brick1:
gfsib01a.corvidtec.com:/data/brick01a/homegfs<br>
Brick2:
gfsib01b.corvidtec.com:/data/brick01b/homegfs<br>
Brick3:
gfsib01a.corvidtec.com:/data/brick02a/homegfs<br>
Brick4:
gfsib01b.corvidtec.com:/data/brick02b/homegfs<br>
Brick5:
gfsib02a.corvidtec.com:/data/brick01a/homegfs<br>
Brick6:
gfsib02b.corvidtec.com:/data/brick01b/homegfs<br>
Brick7:
gfsib02a.corvidtec.com:/data/brick02a/homegfs<br>
Brick8:
gfsib02b.corvidtec.com:/data/brick02b/homegfs<br>
Options
Reconfigured:<br>
performance.io-thread-count:
32<br>
performance.cache-size:
128MB<br>
performance.write-behind-window-size:
128MB<br>
server.allow-insecure:
on<br>
network.ping-timeout:
42<br>
storage.owner-gid:
100<br>
geo-replication.indexing:
off<br>
geo-replication.ignore-pid-check:
on<br>
changelog.changelog:
off<br>
changelog.fsync-interval:
3<br>
changelog.rollover-time:
15<br>
server.manage-gids:
on<br>
diagnostics.client-log-level:
WARNING</div>
<div> </div>
<div>[root@gfs01a
~]# rpm -qa |
grep gluster<br>
gluster-nagios-common-0.1.1-0.el6.noarch<br>
glusterfs-fuse-3.6.6-1.el6.x86_64<br>
glusterfs-debuginfo-3.6.6-1.el6.x86_64<br>
glusterfs-libs-3.6.6-1.el6.x86_64<br>
glusterfs-geo-replication-3.6.6-1.el6.x86_64<br>
glusterfs-api-3.6.6-1.el6.x86_64<br>
glusterfs-devel-3.6.6-1.el6.x86_64<br>
glusterfs-api-devel-3.6.6-1.el6.x86_64<br>
glusterfs-3.6.6-1.el6.x86_64<br>
glusterfs-cli-3.6.6-1.el6.x86_64<br>
glusterfs-rdma-3.6.6-1.el6.x86_64<br>
samba-vfs-glusterfs-4.1.11-2.el6.x86_64<br>
glusterfs-server-3.6.6-1.el6.x86_64<br>
glusterfs-extra-xlators-3.6.6-1.el6.x86_64<br>
</div>
<div> </div>
<div>
<div
style="FONT-SIZE:12pt;FONT-FAMILY:Times
New Roman"><span><span>
<div> </div>
</span></span></div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<pre>_______________________________________________
Gluster-devel mailing list
<a moz-do-not-send="true" href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a>
<a moz-do-not-send="true" href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>
</blockquote>
<br>
</div>
<br>
_______________________________________________<br>
Gluster-users
mailing list<br>
<a
moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a
moz-do-not-send="true"
href="http://www.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer"
target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>
</blockquote>
<br>
</body>
</html>