<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<br>
<div class="moz-cite-prefix">On 11/04/2015 09:10 PM, Adrian
Gruntkowski wrote:<br>
</div>
<blockquote
cite="mid:CAE_wqnO4s_xgWyq_XFvA0pBwYbf-O_-7V-7b9OgRQ4BAa1gW5w@mail.gmail.com"
type="cite">
<div dir="ltr">Hello,
<div><br>
</div>
<div>I have applied Pranith's patch myself on current 3.7.5
release and rebuilt packages. Unfortunately, the issue is
still there :( It behaves exactly the same.</div>
</div>
</blockquote>
Could you get the statedumps of the bricks again? I will take a
look? May be the hang I observed is different from what you are
observing and I only fixed the one I observed.<br>
<br>
Pranith<br>
<blockquote
cite="mid:CAE_wqnO4s_xgWyq_XFvA0pBwYbf-O_-7V-7b9OgRQ4BAa1gW5w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Regards,</div>
<div>Adrian</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2015-10-28 12:02 GMT+01:00 Pranith
Kumar Karampuri <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span class=""> <br>
<br>
<div>On 10/28/2015 04:27 PM, Adrian Gruntkowski wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hello Pranith,
<div><br>
</div>
<div>Thank you for prompt reaction. I didn't get
back to this until now, because I had other
problems to deal with.</div>
<div><br>
</div>
<div>Are there chances that it will get released
this or next month? If not, I will probably have
to resort to compiling on my own.</div>
</div>
</blockquote>
</span> I am planning to get this in for 3.7.6 which is to
be released by end of this month. I guess in 4-5 days :-).
I will update you<span class="HOEnZb"><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div class="h5"><br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Regards,</div>
<div>Adrian</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2015-10-26 12:37
GMT+01:00 Pranith Kumar Karampuri <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com"
target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>
<div> <br>
<br>
<div>On 10/23/2015 10:10 AM, Ravishankar
N wrote:<br>
</div>
<blockquote type="cite"> <br>
<br>
<div>On 10/21/2015 05:55 PM, Adrian
Gruntkowski wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hello,<br>
<br>
I'm trying to track down a problem
with my setup (version 3.7.3 on
Debian stable).<br>
<br>
I have a couple of volumes setup
in 3-node configuration with 1
brick as an arbiter for each. <br>
<br>
There are 4 volumes set up in
cross-over across 3 physical
servers, like this:<br>
<br>
<br>
<br>
------------------------------------->[
GigabitEthernet switch
]<--------------------------<br>
|
^
|<br>
|
|
|<br>
V
V
V<br>
/-------------------------- \
/-------------------------- \
/-------------------------- \<br>
| web-rep |
| cluster-rep
| | mail-rep
|<br>
| |
|
| |
|<br>
| vols: |
| vols:
| | vols:
|<br>
| system_www1 |
| system_www1
| |
system_www1(arbiter) |<br>
| data_www1 |
| data_www1
| |
data_www1(arbiter) |<br>
| system_mail1(arbiter) |
| system_mail1
| |
system_mail1 |<br>
| data_mail1(arbiter) |
| data_mail1
| | data_mail1
|<br>
\---------------------------/
\---------------------------/
\---------------------------/<br>
<br>
<br>
Now, after a fresh boot-up,
everything seems to be running
fine.<br>
Then I start copying big files
(KVM disk images) from local disk
to gluster mounts.<br>
In the beginning it seems to be
running fine (although iowait
seems go so high that it clogs up
io operations<br>
at some moments, but that's an
issue for later). After some time
the transfer freezes, then<br>
after some (long) time, it
advances in a short burst to
freeze again. Another interesting
thing is that<br>
I see constant flow of the network
traffic on interfaces dedicated to
gluster, even when there's a
"freeze".<br>
<br>
I have done "gluster volume
statedump" at that time of
transfer (file is copied from
local disk on cluster-rep<br>
onto local mount of "system_www1"
volume). I've observer a following
section in the dump for
cluster-rep node:<br>
<br>
[xlator.features.locks.system_www1-locks.inode]<br>
path=/images/101/vm-101-disk-1.qcow2<br>
mandatory=0<br>
inodelk-count=12<br>
lock-dump.domain.domain=system_www1-replicate-0:self-heal<br>
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0, start=0, len=0, pid =
18446744073709551610,
owner=c811600cd67f0000,
client=0x7fbe100df280,
connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
granted at 2015-10-21 11:36:22<br>
lock-dump.domain.domain=system_www1-replicate-0<br>
inodelk.inodelk[0](ACTIVE)=type=WRITE,
whence=0, start=2195849216,
len=131072, pid =
18446744073709551610,
owner=c811600cd67f0000,
client=0x7fbe100df280,
connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
granted at 2015-10-21 11:37:45<br>
inodelk.inodelk[1](ACTIVE)=type=WRITE,
whence=0,
start=9223372036854775805, len=1,
pid = 18446744073709551610,
owner=c811600cd67f0000,
client=0x7fbe100df280,
connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
granted at 2015-10-21 11:36:22<br>
</div>
</blockquote>
<br>
From the statedump, It looks like
self-heal daemon had taken locks to
heal the file due to which the locks
attempted by the client (mount) are in
blocked state.<br>
In Arbiter volumes the client (mount)
takes full locks (start=0, len=0) for
every write() as opposed to normal
replica volumes which take range locks
(i.e. appropriate start,len values)
for that write(). This is done to
avoid network split-brains.<br>
So in normal replica volumes, clients
can still write to a file while heal
is going on, as long as the offsets
don't overlap. This is not the case
with arbiter volumes.<br>
You can look at the client or
glustershd logs to see if there are
messages that indicate healing of a
file, something along the lines of
"Completed data selfheal on xxx"<br>
</blockquote>
</div>
</div>
hi Adrian,<br>
Thanks for taking the time to send
this mail. I raised this as bug @<a
moz-do-not-send="true"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1275247"
target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1275247</a>,
fix is posted for review @ <a
moz-do-not-send="true"
href="http://review.gluster.com/#/c/12426/"
target="_blank">http://review.gluster.com/#/c/12426/</a><span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote type="cite"> <br>
<blockquote type="cite">
<div dir="ltr">inodelk.inodelk[2](BLOCKED)=type=WRITE,
whence=0, start=0, len=0, pid = 0,
owner=c4fd2d78487f0000,
client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45<br>
inodelk.inodelk[3](BLOCKED)=type=WRITE,
whence=0, start=0, len=0, pid = 0,
owner=dc752e78487f0000,
client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45<br>
inodelk.inodelk[4](BLOCKED)=type=WRITE,
whence=0, start=0, len=0, pid = 0,
owner=34832e78487f0000,
client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45<br>
inodelk.inodelk[5](BLOCKED)=type=WRITE,
whence=0, start=0, len=0, pid = 0,
owner=d44d2e78487f0000,
client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45<br>
inodelk.inodelk[6](BLOCKED)=type=WRITE,
whence=0, start=0, len=0, pid = 0,
owner=306f2e78487f0000,
client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45<br>
inodelk.inodelk[7](BLOCKED)=type=WRITE,
whence=0, start=0, len=0, pid = 0,
owner=8c902e78487f0000,
client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45<br>
inodelk.inodelk[8](BLOCKED)=type=WRITE,
whence=0, start=0, len=0, pid = 0,
owner=782c2e78487f0000,
client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45<br>
inodelk.inodelk[9](BLOCKED)=type=WRITE,
whence=0, start=0, len=0, pid = 0,
owner=1c0b2e78487f0000,
client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45<br>
inodelk.inodelk[10](BLOCKED)=type=WRITE,
whence=0, start=0, len=0, pid = 0,
owner=24332e78487f0000,
client=0x7fbe100e1380,
connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
blocked at 2015-10-21 11:37:45<br>
<br>
There seem to be multiple locks in
BLOCKED state - which doesn't look
normal to me. The other 2 nodes
have<br>
only 2 ACTIVE locks at the same
time.<br>
<br>
Below is "gluster volume info"
output.<br>
<br>
# gluster volume info<br>
<br>
Volume Name: data_mail1<br>
Type: Replicate<br>
Volume ID:
fc3259a1-ddcf-46e9-ae77-299aaad93b7c<br>
Status: Started<br>
Number of Bricks: 1 x 3 = 3<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1:
cluster-rep:/GFS/data/mail1<br>
Brick2: mail-rep:/GFS/data/mail1<br>
Brick3: web-rep:/GFS/data/mail1<br>
Options Reconfigured:<br>
performance.readdir-ahead: on<br>
cluster.quorum-count: 2<br>
cluster.quorum-type: fixed<br>
cluster.server-quorum-ratio: 51%<br>
<br>
Volume Name: data_www1<br>
Type: Replicate<br>
Volume ID:
0c37a337-dbe5-4e75-8010-94e068c02026<br>
Status: Started<br>
Number of Bricks: 1 x 3 = 3<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: cluster-rep:/GFS/data/www1<br>
Brick2: web-rep:/GFS/data/www1<br>
Brick3: mail-rep:/GFS/data/www1<br>
Options Reconfigured:<br>
performance.readdir-ahead: on<br>
cluster.quorum-type: fixed<br>
cluster.quorum-count: 2<br>
cluster.server-quorum-ratio: 51%<br>
<br>
Volume Name: system_mail1<br>
Type: Replicate<br>
Volume ID:
0568d985-9fa7-40a7-bead-298310622cb5<br>
Status: Started<br>
Number of Bricks: 1 x 3 = 3<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1:
cluster-rep:/GFS/system/mail1<br>
Brick2: mail-rep:/GFS/system/mail1<br>
Brick3: web-rep:/GFS/system/mail1<br>
Options Reconfigured:<br>
performance.readdir-ahead: on<br>
cluster.quorum-type: none<br>
cluster.quorum-count: 2<br>
cluster.server-quorum-ratio: 51%<br>
<br>
Volume Name: system_www1<br>
Type: Replicate<br>
Volume ID:
147636a2-5c15-4d9a-93c8-44d51252b124<br>
Status: Started<br>
Number of Bricks: 1 x 3 = 3<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1:
cluster-rep:/GFS/system/www1<br>
Brick2: web-rep:/GFS/system/www1<br>
Brick3: mail-rep:/GFS/system/www1<br>
Options Reconfigured:<br>
performance.readdir-ahead: on<br>
cluster.quorum-type: none<br>
cluster.quorum-count: 2<br>
cluster.server-quorum-ratio: 51%<br>
<br>
The issue does not occur when I
get rid of 3rd arbiter brick.<br>
</div>
</blockquote>
<br>
What do you mean by 'getting rid of'?
Killing the 3rd brick process of the
volume?<br>
<br>
Regards,<br>
Ravi<br>
<blockquote type="cite">
<div dir="ltr"><br>
If there's any additional
information that is missing and I
could provide, please let me know.<br>
<br>
Greetings,<br>
Adrian</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Gluster-users mailing list
<a moz-do-not-send="true" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Gluster-users mailing list
<a moz-do-not-send="true" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>