<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi,<br>
<br>
the log does not show anything like that. Also, I'm using ext4 on
the bricks.<br>
<br>
The log only contains entries like these:<br>
<br>
[Fri Nov 25 14:23:27 2016] INFO: task gpu_graphene_bv:4476 blocked
for more than 120 seconds.<br>
[Fri Nov 25 14:23:27 2016] Tainted: P OE
3.19.0-25-generic #26~14.04.1-Ubuntu<br>
[Fri Nov 25 14:23:27 2016] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.<br>
[Fri Nov 25 14:23:27 2016] gpu_graphene_bv D ffff8804aa39be08
0 4476 4461 0x00000000<br>
[Fri Nov 25 14:23:27 2016] ffff8804aa39be08 ffff8804ad0febf0
0000000000013e80 ffff8804aa39bfd8<br>
[Fri Nov 25 14:23:27 2016] 0000000000013e80 ffff8804ad403110
ffff8804ad0febf0 ffff8804aa39be18<br>
[Fri Nov 25 14:23:27 2016] ffff8804aa2c87d0 ffff88049df2e000
ffff8804aa39be30 ffff8804aa2c88a0<br>
[Fri Nov 25 14:23:27 2016] Call Trace:<br>
[Fri Nov 25 14:23:27 2016] [<ffffffff817b22e9>]
schedule+0x29/0x70<br>
[Fri Nov 25 14:23:27 2016] [<ffffffff812dc06d>]
__fuse_request_send+0x11d/0x290<br>
[Fri Nov 25 14:23:27 2016] [<ffffffff810b4e10>] ?
prepare_to_wait_event+0x110/0x110<br>
[Fri Nov 25 14:23:27 2016] [<ffffffff812dc1f2>]
fuse_request_send+0x12/0x20<br>
[Fri Nov 25 14:23:27 2016] [<ffffffff812e576d>]
fuse_flush+0x12d/0x180<br>
[Fri Nov 25 14:23:27 2016] [<ffffffff811e9973>]
filp_close+0x33/0x80<br>
[Fri Nov 25 14:23:27 2016] [<ffffffff8120a152>]
__close_fd+0x82/0xa0<br>
[Fri Nov 25 14:23:27 2016] [<ffffffff811e99e3>]
SyS_close+0x23/0x50<br>
[Fri Nov 25 14:23:27 2016] [<ffffffff817b668d>]
system_call_fastpath+0x16/0x1b<br>
<br>
Which is due to the file system not responding, I guess.<br>
Since I switched the mounts from FUSE to NFS, occasionally I also
see:<br>
<br>
[Wed Dec 14 23:42:47 2016] nfs: server giant2 not responding,
still trying<br>
[Wed Dec 14 23:43:12 2016] nfs: server giant2 not responding,
still trying<br>
[Wed Dec 14 23:45:04 2016] nfs: server giant2 OK<br>
[Wed Dec 14 23:45:04 2016] nfs: server giant2 OK<br>
<br>
In another post you asked for logfiles with TRACE loglevel, I'll
provide them shortly.<br>
<br>
Best regards and thanks,<br>
Micha<br>
<br>
Am 19.12.2016 um 16:09 schrieb Mohammed Rafi K C:<br>
</div>
<blockquote
cite="mid:b14c62b3-3381-5298-eb7f-f77be32cef99@redhat.com"
type="cite">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<p>Hi Micha,</p>
<p>Can you please also see if there is any error messages in dmesg
? Basically I'm trying to see whether your hitting issues
described in <a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="https://bugzilla.kernel.org/show_bug.cgi?id=73831">https://bugzilla.kernel.org/show_bug.cgi?id=73831</a>
.</p>
<p><br>
</p>
<p>Regards</p>
<p>Rafi KC</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 12/19/2016 11:58 AM, Mohammed Rafi
K C wrote:<br>
</div>
<blockquote
cite="mid:86231d60-3363-0e68-48d3-818cd73c62e9@redhat.com"
type="cite">
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type">
<p>Hi Micha,</p>
<p>Sorry for the late reply. I was busy with some other things.</p>
<p>If you have still the setup available Can you enable TRACE
log level [1],[2] and see if you could find any log entries
when the network start disconnecting. Basically I'm trying to
find out any disconnection had occurred other than ping timer
expire issue.</p>
<p><br>
</p>
<p><br>
</p>
<p>[1] : gluster volume <volname>
diagnostics.brick-log-level TRACE</p>
<p>[2] : gluster volume <volname>
diagnostics.client-log-level TRACE<br>
</p>
<p><br>
</p>
<p>Regards</p>
<p>Rafi KC<br>
</p>
<br>
<div class="moz-cite-prefix">On 12/08/2016 07:59 PM, Atin
Mukherjee wrote:<br>
</div>
<blockquote
cite="mid:CAGNCGH3Rjy8B7wz+gTQqc35FLpQ4gn9u+bMaDRM0hkaGitUaGw@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Dec 8, 2016 at 4:37 PM,
Micha Ober <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_4766802258719003127moz-cite-prefix">Hi
Rafi,<br>
<br>
thank you for your support. It is greatly
appreciated.<br>
<br>
Just some more thoughts from my side:<br>
<br>
There have been no reports from other users in
*this* thread until now, but I have found at least
one user with a very simiar problem in an older
thread:<br>
<br>
<a moz-do-not-send="true"
class="m_4766802258719003127moz-txt-link-freetext"
href="https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html"
target="_blank">https://www.gluster.org/<wbr>pipermail/gluster-users/2014-<wbr>November/019637.html</a><br>
<br>
He is also reporting disconnects with no apparent
reasons, althogh his setup is a bit more
complicated, also involving a firewall. In our
setup, all servers/clients are connected via 1 GbE
with no firewall or anything that might
block/throttle traffic. Also, we are using exactly
the same software versions on all nodes.<br>
<br>
<br>
I can also find some reports in the bugtracker
when searching for "rpc_client_ping_timer_<wbr>expired"
and "rpc_clnt_ping_timer_expired" (looks like
spelling changed during versions).<br>
<br>
<a moz-do-not-send="true"
class="m_4766802258719003127moz-txt-link-freetext"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1096729"
target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1096729</a></div>
</div>
</blockquote>
<div><br>
</div>
<div>Just FYI, this is a different issue, here GlusterD
fails to handle the volume of incoming requests on
time since MT-epoll is not enabled here.<br>
<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_4766802258719003127moz-cite-prefix"><br>
<a moz-do-not-send="true"
class="m_4766802258719003127moz-txt-link-freetext"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1370683"
target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1370683</a><br>
<br>
But both reports involve large traffic/load on the
bricks/disks, which is not the case for out setup.<br>
To give a ballpark figure: Over three days, 30 GiB
were written. And the data was not written at
once, but continuously over the whole time.<br>
<br>
<br>
Just to be sure, I have checked the logfiles of
one of the other clusters right now, which are
sitting in the same building, in the same rack,
even on the same switch, running the same jobs,
but with glusterfs 3.4.2 and I can see no
disconnects in the logfiles. So I can definitely
rule out our infrastructure as problem.<br>
<br>
Regards,<br>
Micha
<div>
<div class="h5"><br>
<br>
<br>
Am 07.12.2016 um 18:08 schrieb Mohammed Rafi K
C:<br>
</div>
</div>
</div>
<div>
<div class="h5">
<blockquote type="cite">
<p>Hi Micha,</p>
<p>This is great. I will provide you one debug
build which has two fixes which I possible
suspect for a frequent disconnect issue,
though I don't have much data to validate my
theory. So I will take one more day to dig
in to that.</p>
<p>Thanks for your support, and opensource++
</p>
<p>Regards</p>
<p>Rafi KC<br>
</p>
<div
class="m_4766802258719003127moz-cite-prefix">On
12/07/2016 05:02 AM, Micha Ober wrote:<br>
</div>
<blockquote type="cite">
<div
class="m_4766802258719003127moz-cite-prefix">Hi,<br>
<br>
thank you for your answer and even more
for the question!<br>
Until now, I was using FUSE. Today I
changed all mounts to NFS using the same
3.7.17 version.<br>
<br>
But: The problem is still the same. Now,
the NFS logfile contains lines like these:<br>
<br>
[2016-12-06 15:12:29.006325] C
[rpc-clnt-ping.c:165:rpc_clnt_<wbr>ping_timer_expired]
0-gv0-client-7: server X.X.18.62:49153 has
not responded in the last 42 seconds,
disconnecting.<br>
<br>
Interestingly enough, the IP address
X.X.18.62 is the same machine! As I wrote
earlier, each node serves both as a server
and a client, as each node contributes
bricks to the volume. Every server is
connecting to itself via its hostname. For
example, the fstab on the node "giant2"
looks like:<br>
<br>
#giant2:/gv0 /shared_data
glusterfs defaults,noauto 0 0<br>
#giant2:/gv2 /shared_slurm
glusterfs defaults,noauto 0 0<br>
<br>
giant2:/gv0 /shared_data
nfs defaults,_netdev,vers=3
0 0<br>
giant2:/gv2 /shared_slurm
nfs defaults,_netdev,vers=3
0 0<br>
<br>
So I understand the disconnects even less.
<br>
<br>
I don't know if it's possible to create a
dummy cluster which exposes the same
behaviour, because the disconnects only
happen when there are compute jobs running
on those nodes - and they are GPU compute
jobs, so that's something which cannot be
easily emulated in a VM.<br>
<br>
As we have more clusters (which are
running fine with an ancient 3.4 version
:-)) and we are currently not dependent on
this particular cluster (which may stay
like this for this month, I think) I
should be able to deploy the debug build
on the "real" cluster, if you can provide
a debug build.<br>
<br>
Regards and thanks,<br>
Micha<br>
<br>
<br>
<br>
Am 06.12.2016 um 08:15 schrieb Mohammed
Rafi K C:<br>
</div>
<blockquote type="cite">
<p><br>
</p>
<br>
<div
class="m_4766802258719003127moz-cite-prefix">On
12/03/2016 12:56 AM, Micha Ober wrote:<br>
</div>
<blockquote type="cite">
<div
class="m_4766802258719003127moz-cite-prefix"><tt>**
Update: ** I have downgraded from
3.8.6 to 3.7.17 now, but the problem
still exists.</tt><tt><br>
</tt></div>
</blockquote>
<blockquote type="cite">
<div
class="m_4766802258719003127moz-cite-prefix"><tt>
</tt><tt><br>
</tt><tt>Client log: <a
moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://paste.ubuntu.com/">http://paste.ubuntu.com/</a><wbr>23569065/</tt><tt><br>
</tt><tt>Brick log: <a
moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://paste.ubuntu.com/">http://paste.ubuntu.com/</a><wbr>23569067/</tt><tt><br>
</tt><tt><br>
</tt><tt>Please note that each server
has two bricks.</tt><tt><br>
</tt><tt>Whereas, according to the
logs, one brick loses the connection
to all other hosts:</tt><tt><br>
</tt>
<pre style="color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">[2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.219:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.62:49118 failed (Broken pipe)
[2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.107:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.206:49120 failed (Broken pipe)
[2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.58:49121 failed (Broken pipe)
The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
As I said, the network connection is fine and the disks are idle.
The CPU always has 2 free cores.
It looks like I have to downgrade to 3.4 now in order for the disconnects to stop.</pre>
</div>
</blockquote>
<br>
Hi Micha,<br>
<br>
Thanks for the update and sorry for what
happened with gluster higher versions. I
can understand the need for downgrade as
it is a production setup.<br>
<br>
Can you tell me the clients used here ?
whether it is a fuse,nfs,nfs-ganesha, smb
or libgfapi ?<br>
<br>
Since I'm not able to reproduce the issue
(I have been trying from last 3days) and
the logs are not much helpful here (we
don't have much logs in socket layer),
Could you please create a dummy cluster
and try to reproduce the issue? If then we
can play with that volume and I could
provide some debug build which we can use
for further debugging?<br>
<br>
If you don't have bandwidth for this,
please leave it ;).<br>
<br>
Regards<br>
Rafi KC<br>
<br>
<blockquote type="cite">
<div
class="m_4766802258719003127moz-cite-prefix">
<pre style="color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">- Micha
</pre>
<br>
Am 30.11.2016 um 06:57 schrieb
Mohammed Rafi K C:<br>
</div>
<blockquote type="cite">
<p>Hi Micha,</p>
<p>I have changed the thread and
subject so that your original thread
remain same for your query. Let's
try to fix the problem what you
observed with 3.8.4, So I have
started a new thread to discuss the
frequent disconnect problem.</p>
<p><b>If any one else has experienced
the same problem, please respond
to the mail.</b><br>
</p>
<p>It would be very helpful if you
could give us some more logs from
clients and bricks. Also any
reproducible steps will surely help
to chase the problem further.</p>
<p>Regards</p>
<p>Rafi KC<br>
</p>
<div
class="m_4766802258719003127moz-cite-prefix">On
11/30/2016 04:44 AM, Micha Ober
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div><font face="monospace,
monospace">I had opened
another thread on this
mailing list (Subject:
"After upgrade from 3.4.2 to
3.8.5 - High CPU usage
resulting in disconnects and
split-brain").</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">The title may be
a bit misleading now, as I
am no longer observing high
CPU usage after upgrading to
3.8.6, but the disconnects
are still happening and the
number of files in
split-brain is growing.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">Setup: 6 compute
nodes, each serving as a
glusterfs server and client,
Ubuntu 14.04, two bricks per
node, distribute-replicate</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I have two
gluster volumes set up (one
for scratch data, one for
the slurm scheduler). Only
the scratch data volume
shows critical errors "[...]
has not responded in the
last 42 seconds,
disconnecting.". So I can
rule out network problems,
the gigabit link between the
nodes is not saturated at
all. The disks are almost
idle (<10%).</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I have glusterfs
3.4.2 on Ubuntu 12.04 on a
another compute cluster,
running fine since it was
deployed.</font></div>
<div><font face="monospace,
monospace">I had glusterfs
3.4.2 on Ubuntu 14.04 on
this cluster, running fine
for almost a year.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">After upgrading
to 3.8.5, the problems (as
described) started. I would
like to use some of the new
features of the newer
versions (like bitrot), but
the users can't run their
compute jobs right now
because the result files are
garbled.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">There also seems
to be a bug report with a
smiliar problem: (but no
progress)</font></div>
<div><font face="monospace,
monospace"><a
moz-do-not-send="true"
class="moz-txt-link-freetext"
href="https://bugzilla.redhat.com/">https://bugzilla.redhat.com/</a><wbr>show_bug.cgi?id=1370683</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For me, ALL
servers are affected (not
isolated to one or two
servers)</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I also see
messages like <a
moz-do-not-send="true"
class="moz-txt-link-rfc2396E"
href="INFO:taskgpu_graphene_bv:4476blockedformorethan120seconds.">"INFO:
task gpu_graphene_bv:4476
blocked for more than 120
seconds."</a> in the
syslog.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For completeness
(gv0 is the scratch volume,
gv2 the slurm volume):</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">[root@giant2: ~]#
gluster v info</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">Volume Name: gv0</font></div>
<div><font face="monospace,
monospace">Type:
Distributed-Replicate</font></div>
<div><font face="monospace,
monospace">Volume ID:
993ec7c9-e4bc-44d0-b7c4-<wbr>2d977e622e86</font></div>
<div><font face="monospace,
monospace">Status: Started</font></div>
<div><font face="monospace,
monospace">Snapshot Count: 0</font></div>
<div><font face="monospace,
monospace">Number of Bricks:
6 x 2 = 12</font></div>
<div><font face="monospace,
monospace">Transport-type:
tcp</font></div>
<div><font face="monospace,
monospace">Bricks:</font></div>
<div><font face="monospace,
monospace">Brick1:
giant1:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick2:
giant2:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick3:
giant3:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick4:
giant4:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick5:
giant5:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick6:
giant6:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick7:
giant1:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick8:
giant2:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick9:
giant3:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick10:
giant4:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick11:
giant5:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick12:
giant6:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Options
Reconfigured:</font></div>
<div><font face="monospace,
monospace">auth.allow:
X.X.X.*,127.0.0.1</font></div>
<div><font face="monospace,
monospace">nfs.disable: on</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">Volume Name: gv2</font></div>
<div><font face="monospace,
monospace">Type: Replicate</font></div>
<div><font face="monospace,
monospace">Volume ID:
30c78928-5f2c-4671-becc-<wbr>8deaee1a7a8d</font></div>
<div><font face="monospace,
monospace">Status: Started</font></div>
<div><font face="monospace,
monospace">Snapshot Count: 0</font></div>
<div><font face="monospace,
monospace">Number of Bricks:
1 x 2 = 2</font></div>
<div><font face="monospace,
monospace">Transport-type:
tcp</font></div>
<div><font face="monospace,
monospace">Bricks:</font></div>
<div><font face="monospace,
monospace">Brick1:
giant1:/gluster/sdd/gv2</font></div>
<div><font face="monospace,
monospace">Brick2:
giant2:/gluster/sdd/gv2</font></div>
<div><font face="monospace,
monospace">Options
Reconfigured:</font></div>
<div><font face="monospace,
monospace">auth.allow:
X.X.X.*,127.0.0.1</font></div>
<div><font face="monospace,
monospace">cluster.granular-entry-heal:
on</font></div>
<div><font face="monospace,
monospace">cluster.locking-scheme:
granular</font></div>
<div><font face="monospace,
monospace">nfs.disable: on</font></div>
<div
style="font-family:monospace,monospace"><br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2016-11-30
0:10 GMT+01:00 Micha Ober <span
dir="ltr"><<a
moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:micha2k@gmail.com">micha2k@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div dir="ltr">
<div
style="font-family:monospace,monospace">There
also seems to be a bug
report with a smiliar
problem: (but no progress)</div>
<div><font face="monospace,
monospace"><a
moz-do-not-send="true"
class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/sh">https://bugzilla.redhat.com/sh</a><wbr>ow_bug.cgi?id=1370683</font><br>
</div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For me, ALL
servers are affected
(not isolated to one or
two servers)</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I also see
messages like <a
moz-do-not-send="true"
class="moz-txt-link-rfc2396E"
href="INFO:taskgpu_graphene_bv:4476blockedformorethan120seconds.">"INFO:
task
gpu_graphene_bv:4476
blocked for more than
120 seconds."</a> in
the syslog.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For
completeness (gv0 is the
scratch volume, gv2 the
slurm volume):</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">
<div>[root@giant2: ~]#
gluster v info</div>
<div><br>
</div>
<div>Volume Name: gv0</div>
<div>Type:
Distributed-Replicate</div>
<div>Volume ID:
993ec7c9-e4bc-44d0-b7c4-2d977e<wbr>622e86</div>
<div>Status: Started</div>
<div>Snapshot Count: 0</div>
<div>Number of Bricks: 6
x 2 = 12</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1:
giant1:/gluster/sdc/gv0</div>
<div>Brick2:
giant2:/gluster/sdc/gv0</div>
<div>Brick3:
giant3:/gluster/sdc/gv0</div>
<div>Brick4:
giant4:/gluster/sdc/gv0</div>
<div>Brick5:
giant5:/gluster/sdc/gv0</div>
<div>Brick6:
giant6:/gluster/sdc/gv0</div>
<div>Brick7:
giant1:/gluster/sdd/gv0</div>
<div>Brick8:
giant2:/gluster/sdd/gv0</div>
<div>Brick9:
giant3:/gluster/sdd/gv0</div>
<div>Brick10:
giant4:/gluster/sdd/gv0</div>
<div>Brick11:
giant5:/gluster/sdd/gv0</div>
<div>Brick12:
giant6:/gluster/sdd/gv0</div>
<div>Options
Reconfigured:</div>
<div>auth.allow:
X.X.X.*,127.0.0.1</div>
<div>nfs.disable: on</div>
<div><br>
</div>
<div>Volume Name: gv2</div>
<div>Type: Replicate</div>
<div>Volume ID:
30c78928-5f2c-4671-becc-8deaee<wbr>1a7a8d</div>
<div>Status: Started</div>
<div>Snapshot Count: 0</div>
<div>Number of Bricks: 1
x 2 = 2</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1:
giant1:/gluster/sdd/gv2</div>
<div>Brick2:
giant2:/gluster/sdd/gv2</div>
<div>Options
Reconfigured:</div>
<div>auth.allow:
X.X.X.*,127.0.0.1</div>
<div>cluster.granular-entry-heal:
on</div>
<div>cluster.locking-scheme:
granular</div>
<div>nfs.disable: on</div>
<div><br>
</div>
</font></div>
</div>
<div
class="m_4766802258719003127HOEnZb">
<div
class="m_4766802258719003127h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">2016-11-29
19:21 GMT+01:00 Micha
Ober <span dir="ltr"><<a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:micha2k@gmail.com">micha2k@gmail.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div dir="ltr">
<div
style="font-family:monospace,monospace">I
had opened
another thread
on this mailing
list (Subject:
"After upgrade
from 3.4.2 to
3.8.5 - High CPU
usage resulting
in disconnects
and
split-brain").</div>
<div
style="font-family:monospace,monospace"><br>
</div>
<div
style="font-family:monospace,monospace">The
title may be a
bit misleading
now, as I am no
longer observing
high CPU usage
after upgrading
to 3.8.6, but
the disconnects
are still
happening and
the number of
files in
split-brain is
growing.<br>
</div>
<div
style="font-family:monospace,monospace"><br>
</div>
<div
style="font-family:monospace,monospace">Setup:
6 compute nodes,
each serving as
a glusterfs
server and
client, Ubuntu
14.04, two
bricks per node,
distribute-replicate</div>
<div
style="font-family:monospace,monospace"><br>
</div>
<div
style="font-family:monospace,monospace">I
have two gluster
volumes set up
(one for scratch
data, one for
the slurm
scheduler). Only
the scratch data
volume shows
critical errors
"[...] has not
responded in the
last 42 seconds,
disconnecting.".
So I can rule
out network
problems, the
gigabit link
between the
nodes is not
saturated at
all. The disks
are almost idle
(<10%).</div>
<div
style="font-family:monospace,monospace"><br>
</div>
<div
style="font-family:monospace,monospace">I
have glusterfs
3.4.2 on Ubuntu
12.04 on a
another compute
cluster, running
fine since it
was deployed.</div>
<div
style="font-family:monospace,monospace">I
had glusterfs
3.4.2 on Ubuntu
14.04 on this
cluster, running
fine for almost
a year.</div>
<div
style="font-family:monospace,monospace"><br>
</div>
<div
style="font-family:monospace,monospace">After
upgrading to
3.8.5, the
problems (as
described)
started. I would
like to use some
of the new
features of the
newer versions
(like bitrot),
but the users
can't run their
compute jobs
right now
because the
result files are
garbled.</div>
</div>
<div
class="m_4766802258719003127m_-1578094958703753071HOEnZb">
<div
class="m_4766802258719003127m_-1578094958703753071h5">
<div
class="gmail_extra"><br>
<div
class="gmail_quote">2016-11-29
18:53
GMT+01:00 Atin
Mukherjee <span
dir="ltr"><<a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="white-space:pre-wrap">Would you be able to share what is not working for you in 3.8.x (mention the exact version). 3.4 is quite old and falling back to an unsupported version doesn't look a feasible option.</div>
<br>
<div
class="gmail_quote">
<div>
<div
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209h5">
<div dir="ltr">On
Tue, 29 Nov
2016 at 17:01,
Micha Ober
<<a
moz-do-not-send="true"
class="moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com">micha2k@gmail.com</a>>
wrote:<br>
</div>
</div>
</div>
<blockquote
class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209h5">
<div dir="ltr"
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">Hi,</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">I was using gluster 3.4 and
upgraded to
3.8, but that
version showed
to be unusable
for me. I now
need to
downgrade.</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">I'm running Ubuntu 14.04. As
upgrades of
the op version
are irreversible, I guess I have to delete all gluster volumes and
re-create them
with the
downgraded
version. </div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">0. Backup data</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">1. Unmount all gluster volumes</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">2. apt-get purge
glusterfs-server
glusterfs-client</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">3. Remove PPA for 3.8</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">4. Add PPA for older version</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">5. apt-get install
glusterfs-server
glusterfs-client</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">6. Create volumes</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">Is "purge" enough to delete all
configuration
files of the
currently
installed
version or do
I need to
manually
clear some
residues
before
installing an
older version?</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">Thanks.</div>
</div>
</div>
</div>
<span>
______________________________<wbr>_________________<br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
Gluster-users
mailing list<br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
<a
moz-do-not-send="true"
class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
<a
moz-do-not-send="true"
class="moz-txt-link-freetext" href="http://www.gluster.org/mailman">http://www.gluster.org/mailman</a><wbr>/listinfo/gluster-users</span></blockquote>
</div>
<span
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209HOEnZb"><font
color="#888888">
<div dir="ltr">--
<br>
</div>
<div
data-smartmail="gmail_signature">-
Atin (atinm)</div>
</font></span></blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset
class="m_4766802258719003127mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
Gluster-users mailing list
<a moz-do-not-send="true" class="m_4766802258719003127moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" class="m_4766802258719003127moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/<wbr>mailman/listinfo/gluster-users</a></pre>
</blockquote>
</blockquote>
<p>
</p>
</blockquote>
</blockquote>
<p>
</p>
</blockquote>
</blockquote>
<p>
</p>
</div></div></div>
</blockquote></div>
--
<div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">
</div><div>~ Atin (atinm)
</div></div></div></div>
</div></div>
</blockquote>
</blockquote>
</blockquote><p>
</p></body></html>