<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Hi Micha,</p>
<p>Can you please also see if there is any error messages in dmesg ?
Basically I'm trying to see whether your hitting issues described
in <a class="moz-txt-link-freetext" href="https://bugzilla.kernel.org/show_bug.cgi?id=73831">https://bugzilla.kernel.org/show_bug.cgi?id=73831</a> .</p>
<p><br>
</p>
<p>Regards</p>
<p>Rafi KC</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 12/19/2016 11:58 AM, Mohammed Rafi K
C wrote:<br>
</div>
<blockquote
cite="mid:86231d60-3363-0e68-48d3-818cd73c62e9@redhat.com"
type="cite">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<p>Hi Micha,</p>
<p>Sorry for the late reply. I was busy with some other things.</p>
<p>If you have still the setup available Can you enable TRACE log
level [1],[2] and see if you could find any log entries when the
network start disconnecting. Basically I'm trying to find out
any disconnection had occurred other than ping timer expire
issue.</p>
<p><br>
</p>
<p><br>
</p>
<p>[1] : gluster volume <volname>
diagnostics.brick-log-level TRACE</p>
<p>[2] : gluster volume <volname>
diagnostics.client-log-level TRACE<br>
</p>
<p><br>
</p>
<p>Regards</p>
<p>Rafi KC<br>
</p>
<br>
<div class="moz-cite-prefix">On 12/08/2016 07:59 PM, Atin
Mukherjee wrote:<br>
</div>
<blockquote
cite="mid:CAGNCGH3Rjy8B7wz+gTQqc35FLpQ4gn9u+bMaDRM0hkaGitUaGw@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Dec 8, 2016 at 4:37 PM,
Micha Ober <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_4766802258719003127moz-cite-prefix">Hi
Rafi,<br>
<br>
thank you for your support. It is greatly
appreciated.<br>
<br>
Just some more thoughts from my side:<br>
<br>
There have been no reports from other users in
*this* thread until now, but I have found at least
one user with a very simiar problem in an older
thread:<br>
<br>
<a moz-do-not-send="true"
class="m_4766802258719003127moz-txt-link-freetext"
href="https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html"
target="_blank">https://www.gluster.org/<wbr>pipermail/gluster-users/2014-<wbr>November/019637.html</a><br>
<br>
He is also reporting disconnects with no apparent
reasons, althogh his setup is a bit more
complicated, also involving a firewall. In our
setup, all servers/clients are connected via 1 GbE
with no firewall or anything that might
block/throttle traffic. Also, we are using exactly
the same software versions on all nodes.<br>
<br>
<br>
I can also find some reports in the bugtracker when
searching for "rpc_client_ping_timer_<wbr>expired"
and "rpc_clnt_ping_timer_expired" (looks like
spelling changed during versions).<br>
<br>
<a moz-do-not-send="true"
class="m_4766802258719003127moz-txt-link-freetext"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1096729"
target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1096729</a></div>
</div>
</blockquote>
<div><br>
</div>
<div>Just FYI, this is a different issue, here GlusterD
fails to handle the volume of incoming requests on time
since MT-epoll is not enabled here.<br>
<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_4766802258719003127moz-cite-prefix"><br>
<a moz-do-not-send="true"
class="m_4766802258719003127moz-txt-link-freetext"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1370683"
target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1370683</a><br>
<br>
But both reports involve large traffic/load on the
bricks/disks, which is not the case for out setup.<br>
To give a ballpark figure: Over three days, 30 GiB
were written. And the data was not written at once,
but continuously over the whole time.<br>
<br>
<br>
Just to be sure, I have checked the logfiles of one
of the other clusters right now, which are sitting
in the same building, in the same rack, even on the
same switch, running the same jobs, but with
glusterfs 3.4.2 and I can see no disconnects in the
logfiles. So I can definitely rule out our
infrastructure as problem.<br>
<br>
Regards,<br>
Micha
<div>
<div class="h5"><br>
<br>
<br>
Am 07.12.2016 um 18:08 schrieb Mohammed Rafi K
C:<br>
</div>
</div>
</div>
<div>
<div class="h5">
<blockquote type="cite">
<p>Hi Micha,</p>
<p>This is great. I will provide you one debug
build which has two fixes which I possible
suspect for a frequent disconnect issue,
though I don't have much data to validate my
theory. So I will take one more day to dig in
to that.</p>
<p>Thanks for your support, and opensource++ </p>
<p>Regards</p>
<p>Rafi KC<br>
</p>
<div
class="m_4766802258719003127moz-cite-prefix">On
12/07/2016 05:02 AM, Micha Ober wrote:<br>
</div>
<blockquote type="cite">
<div
class="m_4766802258719003127moz-cite-prefix">Hi,<br>
<br>
thank you for your answer and even more for
the question!<br>
Until now, I was using FUSE. Today I changed
all mounts to NFS using the same 3.7.17
version.<br>
<br>
But: The problem is still the same. Now, the
NFS logfile contains lines like these:<br>
<br>
[2016-12-06 15:12:29.006325] C
[rpc-clnt-ping.c:165:rpc_clnt_<wbr>ping_timer_expired]
0-gv0-client-7: server X.X.18.62:49153 has
not responded in the last 42 seconds,
disconnecting.<br>
<br>
Interestingly enough, the IP address
X.X.18.62 is the same machine! As I wrote
earlier, each node serves both as a server
and a client, as each node contributes
bricks to the volume. Every server is
connecting to itself via its hostname. For
example, the fstab on the node "giant2"
looks like:<br>
<br>
#giant2:/gv0 /shared_data
glusterfs defaults,noauto 0 0<br>
#giant2:/gv2 /shared_slurm
glusterfs defaults,noauto 0 0<br>
<br>
giant2:/gv0 /shared_data
nfs defaults,_netdev,vers=3
0 0<br>
giant2:/gv2 /shared_slurm
nfs defaults,_netdev,vers=3
0 0<br>
<br>
So I understand the disconnects even less. <br>
<br>
I don't know if it's possible to create a
dummy cluster which exposes the same
behaviour, because the disconnects only
happen when there are compute jobs running
on those nodes - and they are GPU compute
jobs, so that's something which cannot be
easily emulated in a VM.<br>
<br>
As we have more clusters (which are running
fine with an ancient 3.4 version :-)) and we
are currently not dependent on this
particular cluster (which may stay like this
for this month, I think) I should be able to
deploy the debug build on the "real"
cluster, if you can provide a debug build.<br>
<br>
Regards and thanks,<br>
Micha<br>
<br>
<br>
<br>
Am 06.12.2016 um 08:15 schrieb Mohammed Rafi
K C:<br>
</div>
<blockquote type="cite">
<p><br>
</p>
<br>
<div
class="m_4766802258719003127moz-cite-prefix">On
12/03/2016 12:56 AM, Micha Ober wrote:<br>
</div>
<blockquote type="cite">
<div
class="m_4766802258719003127moz-cite-prefix"><tt>**
Update: ** I have downgraded from
3.8.6 to 3.7.17 now, but the problem
still exists.</tt><tt><br>
</tt></div>
</blockquote>
<blockquote type="cite">
<div
class="m_4766802258719003127moz-cite-prefix"><tt>
</tt><tt><br>
</tt><tt>Client log: <a
moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://paste.ubuntu.com/"><a class="moz-txt-link-freetext" href="http://paste.ubuntu.com/">http://paste.ubuntu.com/</a></a><wbr>23569065/</tt><tt><br>
</tt><tt>Brick log: <a
moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://paste.ubuntu.com/"><a class="moz-txt-link-freetext" href="http://paste.ubuntu.com/">http://paste.ubuntu.com/</a></a><wbr>23569067/</tt><tt><br>
</tt><tt><br>
</tt><tt>Please note that each server
has two bricks.</tt><tt><br>
</tt><tt>Whereas, according to the logs,
one brick loses the connection to all
other hosts:</tt><tt><br>
</tt>
<pre style="color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">[2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.219:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.62:49118 failed (Broken pipe)
[2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.107:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.206:49120 failed (Broken pipe)
[2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.58:49121 failed (Broken pipe)
The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
As I said, the network connection is fine and the disks are idle.
The CPU always has 2 free cores.
It looks like I have to downgrade to 3.4 now in order for the disconnects to stop.</pre>
</div>
</blockquote>
<br>
Hi Micha,<br>
<br>
Thanks for the update and sorry for what
happened with gluster higher versions. I can
understand the need for downgrade as it is a
production setup.<br>
<br>
Can you tell me the clients used here ?
whether it is a fuse,nfs,nfs-ganesha, smb or
libgfapi ?<br>
<br>
Since I'm not able to reproduce the issue (I
have been trying from last 3days) and the
logs are not much helpful here (we don't
have much logs in socket layer), Could you
please create a dummy cluster and try to
reproduce the issue? If then we can play
with that volume and I could provide some
debug build which we can use for further
debugging?<br>
<br>
If you don't have bandwidth for this, please
leave it ;).<br>
<br>
Regards<br>
Rafi KC<br>
<br>
<blockquote type="cite">
<div
class="m_4766802258719003127moz-cite-prefix">
<pre style="color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">- Micha
</pre>
<br>
Am 30.11.2016 um 06:57 schrieb Mohammed
Rafi K C:<br>
</div>
<blockquote type="cite">
<p>Hi Micha,</p>
<p>I have changed the thread and subject
so that your original thread remain
same for your query. Let's try to fix
the problem what you observed with
3.8.4, So I have started a new thread
to discuss the frequent disconnect
problem.</p>
<p><b>If any one else has experienced
the same problem, please respond to
the mail.</b><br>
</p>
<p>It would be very helpful if you could
give us some more logs from clients
and bricks. Also any reproducible
steps will surely help to chase the
problem further.</p>
<p>Regards</p>
<p>Rafi KC<br>
</p>
<div
class="m_4766802258719003127moz-cite-prefix">On
11/30/2016 04:44 AM, Micha Ober wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div><font face="monospace,
monospace">I had opened
another thread on this mailing
list (Subject: "After upgrade
from 3.4.2 to 3.8.5 - High CPU
usage resulting in disconnects
and split-brain").</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">The title may be a
bit misleading now, as I am no
longer observing high CPU
usage after upgrading to
3.8.6, but the disconnects are
still happening and the number
of files in split-brain is
growing.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">Setup: 6 compute
nodes, each serving as a
glusterfs server and client,
Ubuntu 14.04, two bricks per
node, distribute-replicate</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I have two gluster
volumes set up (one for
scratch data, one for the
slurm scheduler). Only the
scratch data volume shows
critical errors "[...] has not
responded in the last 42
seconds, disconnecting.". So I
can rule out network problems,
the gigabit link between the
nodes is not saturated at all.
The disks are almost idle
(<10%).</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I have glusterfs
3.4.2 on Ubuntu 12.04 on a
another compute cluster,
running fine since it was
deployed.</font></div>
<div><font face="monospace,
monospace">I had glusterfs
3.4.2 on Ubuntu 14.04 on this
cluster, running fine for
almost a year.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">After upgrading to
3.8.5, the problems (as
described) started. I would
like to use some of the new
features of the newer versions
(like bitrot), but the users
can't run their compute jobs
right now because the result
files are garbled.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">There also seems to
be a bug report with a smiliar
problem: (but no progress)</font></div>
<div><font face="monospace,
monospace"><a
moz-do-not-send="true"
class="moz-txt-link-freetext"
href="https://bugzilla.redhat.com/"><a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/">https://bugzilla.redhat.com/</a></a><wbr>show_bug.cgi?id=1370683</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For me, ALL servers
are affected (not isolated to
one or two servers)</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I also see messages
like <a
moz-do-not-send="true"
class="m_4766802258719003127moz-txt-link-rfc2396E"><a class="moz-txt-link-rfc2396E" href="INFO:taskgpu_graphene_bv:4476blockedformorethan120seconds.">"INFO:
task gpu_graphene_bv:4476
blocked for more than 120
seconds."</a></a> in the syslog.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For completeness
(gv0 is the scratch volume,
gv2 the slurm volume):</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">[root@giant2: ~]#
gluster v info</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">Volume Name: gv0</font></div>
<div><font face="monospace,
monospace">Type:
Distributed-Replicate</font></div>
<div><font face="monospace,
monospace">Volume ID:
993ec7c9-e4bc-44d0-b7c4-<wbr>2d977e622e86</font></div>
<div><font face="monospace,
monospace">Status: Started</font></div>
<div><font face="monospace,
monospace">Snapshot Count: 0</font></div>
<div><font face="monospace,
monospace">Number of Bricks: 6
x 2 = 12</font></div>
<div><font face="monospace,
monospace">Transport-type: tcp</font></div>
<div><font face="monospace,
monospace">Bricks:</font></div>
<div><font face="monospace,
monospace">Brick1:
giant1:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick2:
giant2:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick3:
giant3:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick4:
giant4:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick5:
giant5:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick6:
giant6:/gluster/sdc/gv0</font></div>
<div><font face="monospace,
monospace">Brick7:
giant1:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick8:
giant2:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick9:
giant3:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick10:
giant4:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick11:
giant5:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Brick12:
giant6:/gluster/sdd/gv0</font></div>
<div><font face="monospace,
monospace">Options
Reconfigured:</font></div>
<div><font face="monospace,
monospace">auth.allow:
X.X.X.*,127.0.0.1</font></div>
<div><font face="monospace,
monospace">nfs.disable: on</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">Volume Name: gv2</font></div>
<div><font face="monospace,
monospace">Type: Replicate</font></div>
<div><font face="monospace,
monospace">Volume ID:
30c78928-5f2c-4671-becc-<wbr>8deaee1a7a8d</font></div>
<div><font face="monospace,
monospace">Status: Started</font></div>
<div><font face="monospace,
monospace">Snapshot Count: 0</font></div>
<div><font face="monospace,
monospace">Number of Bricks: 1
x 2 = 2</font></div>
<div><font face="monospace,
monospace">Transport-type: tcp</font></div>
<div><font face="monospace,
monospace">Bricks:</font></div>
<div><font face="monospace,
monospace">Brick1:
giant1:/gluster/sdd/gv2</font></div>
<div><font face="monospace,
monospace">Brick2:
giant2:/gluster/sdd/gv2</font></div>
<div><font face="monospace,
monospace">Options
Reconfigured:</font></div>
<div><font face="monospace,
monospace">auth.allow:
X.X.X.*,127.0.0.1</font></div>
<div><font face="monospace,
monospace">cluster.granular-entry-heal:
on</font></div>
<div><font face="monospace,
monospace">cluster.locking-scheme:
granular</font></div>
<div><font face="monospace,
monospace">nfs.disable: on</font></div>
<div
style="font-family:monospace,monospace"><br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2016-11-30
0:10 GMT+01:00 Micha Ober <span
dir="ltr"><<a
moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:micha2k@gmail.com"><a class="moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com">micha2k@gmail.com</a></a>></span>:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div dir="ltr">
<div
style="font-family:monospace,monospace">There
also seems to be a bug
report with a smiliar
problem: (but no progress)</div>
<div><font face="monospace,
monospace"><a
moz-do-not-send="true"
class="moz-txt-link-freetext"
href="https://bugzilla.redhat.com/sh"><a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/sh">https://bugzilla.redhat.com/sh</a></a><wbr>ow_bug.cgi?id=1370683</font><br>
</div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For me, ALL
servers are affected (not
isolated to one or two
servers)</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">I also see
messages like <a
moz-do-not-send="true"
class="moz-txt-link-rfc2396E"
href="INFO:taskgpu_graphene_bv:4476blockedformorethan120seconds."><a class="moz-txt-link-rfc2396E" href="INFO:taskgpu_graphene_bv:4476blockedformorethan120seconds.">"INFO:
task
gpu_graphene_bv:4476
blocked for more than
120 seconds."</a></a> in the
syslog.</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">For
completeness (gv0 is the
scratch volume, gv2 the
slurm volume):</font></div>
<div><font face="monospace,
monospace"><br>
</font></div>
<div><font face="monospace,
monospace">
<div>[root@giant2: ~]#
gluster v info</div>
<div><br>
</div>
<div>Volume Name: gv0</div>
<div>Type:
Distributed-Replicate</div>
<div>Volume ID:
993ec7c9-e4bc-44d0-b7c4-2d977e<wbr>622e86</div>
<div>Status: Started</div>
<div>Snapshot Count: 0</div>
<div>Number of Bricks: 6 x
2 = 12</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1:
giant1:/gluster/sdc/gv0</div>
<div>Brick2:
giant2:/gluster/sdc/gv0</div>
<div>Brick3:
giant3:/gluster/sdc/gv0</div>
<div>Brick4:
giant4:/gluster/sdc/gv0</div>
<div>Brick5:
giant5:/gluster/sdc/gv0</div>
<div>Brick6:
giant6:/gluster/sdc/gv0</div>
<div>Brick7:
giant1:/gluster/sdd/gv0</div>
<div>Brick8:
giant2:/gluster/sdd/gv0</div>
<div>Brick9:
giant3:/gluster/sdd/gv0</div>
<div>Brick10:
giant4:/gluster/sdd/gv0</div>
<div>Brick11:
giant5:/gluster/sdd/gv0</div>
<div>Brick12:
giant6:/gluster/sdd/gv0</div>
<div>Options Reconfigured:</div>
<div>auth.allow:
X.X.X.*,127.0.0.1</div>
<div>nfs.disable: on</div>
<div><br>
</div>
<div>Volume Name: gv2</div>
<div>Type: Replicate</div>
<div>Volume ID:
30c78928-5f2c-4671-becc-8deaee<wbr>1a7a8d</div>
<div>Status: Started</div>
<div>Snapshot Count: 0</div>
<div>Number of Bricks: 1 x
2 = 2</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1:
giant1:/gluster/sdd/gv2</div>
<div>Brick2:
giant2:/gluster/sdd/gv2</div>
<div>Options Reconfigured:</div>
<div>auth.allow:
X.X.X.*,127.0.0.1</div>
<div>cluster.granular-entry-heal:
on</div>
<div>cluster.locking-scheme:
granular</div>
<div>nfs.disable: on</div>
<div><br>
</div>
</font></div>
</div>
<div
class="m_4766802258719003127HOEnZb">
<div
class="m_4766802258719003127h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">2016-11-29
19:21 GMT+01:00 Micha
Ober <span dir="ltr"><<a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:micha2k@gmail.com"><a class="moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com">micha2k@gmail.com</a></a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div dir="ltr">
<div
style="font-family:monospace,monospace">I
had opened another
thread on this
mailing list
(Subject: "After
upgrade from 3.4.2
to 3.8.5 - High
CPU usage
resulting in
disconnects and
split-brain").</div>
<div
style="font-family:monospace,monospace"><br>
</div>
<div
style="font-family:monospace,monospace">The
title may be a bit
misleading now, as
I am no longer
observing high CPU
usage after
upgrading to
3.8.6, but the
disconnects are
still happening
and the number of
files in
split-brain is
growing.<br>
</div>
<div
style="font-family:monospace,monospace"><br>
</div>
<div
style="font-family:monospace,monospace">Setup:
6 compute nodes,
each serving as a
glusterfs server
and client, Ubuntu
14.04, two bricks
per node,
distribute-replicate</div>
<div
style="font-family:monospace,monospace"><br>
</div>
<div
style="font-family:monospace,monospace">I
have two gluster
volumes set up
(one for scratch
data, one for the
slurm scheduler).
Only the scratch
data volume shows
critical errors
"[...] has not
responded in the
last 42 seconds,
disconnecting.".
So I can rule out
network problems,
the gigabit link
between the nodes
is not saturated
at all. The disks
are almost idle
(<10%).</div>
<div
style="font-family:monospace,monospace"><br>
</div>
<div
style="font-family:monospace,monospace">I
have glusterfs
3.4.2 on Ubuntu
12.04 on a another
compute cluster,
running fine since
it was deployed.</div>
<div
style="font-family:monospace,monospace">I
had glusterfs
3.4.2 on Ubuntu
14.04 on this
cluster, running
fine for almost a
year.</div>
<div
style="font-family:monospace,monospace"><br>
</div>
<div
style="font-family:monospace,monospace">After
upgrading to
3.8.5, the
problems (as
described)
started. I would
like to use some
of the new
features of the
newer versions
(like bitrot), but
the users can't
run their compute
jobs right now
because the result
files are garbled.</div>
</div>
<div
class="m_4766802258719003127m_-1578094958703753071HOEnZb">
<div
class="m_4766802258719003127m_-1578094958703753071h5">
<div
class="gmail_extra"><br>
<div
class="gmail_quote">2016-11-29
18:53
GMT+01:00 Atin
Mukherjee <span
dir="ltr"><<a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:amukherj@redhat.com"><a class="moz-txt-link-abbreviated" href="mailto:amukherj@redhat.com">amukherj@redhat.com</a></a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="white-space:pre-wrap">Would you be able to share what is not working for you in 3.8.x (mention the exact version). 3.4 is quite old and falling back to an unsupported version doesn't look a feasible option.</div>
<br>
<div
class="gmail_quote">
<div>
<div
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209h5">
<div dir="ltr">On
Tue, 29 Nov
2016 at 17:01,
Micha Ober
<<a
moz-do-not-send="true"
class="moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com"><a class="moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com">micha2k@gmail.com</a></a>>
wrote:<br>
</div>
</div>
</div>
<blockquote
class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209h5">
<div dir="ltr"
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">Hi,</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">I was using gluster 3.4 and
upgraded to
3.8, but that
version showed
to be unusable
for me. I now
need to
downgrade.</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">I'm running Ubuntu 14.04. As
upgrades of
the op version
are irreversible, I guess I have to delete all gluster volumes and
re-create them
with the
downgraded
version. </div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">0. Backup data</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">1. Unmount all gluster volumes</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">2. apt-get purge
glusterfs-server
glusterfs-client</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">3. Remove PPA for 3.8</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">4. Add PPA for older version</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">5. apt-get install
glusterfs-server
glusterfs-client</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">6. Create volumes</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">Is "purge" enough to delete all
configuration
files of the
currently
installed
version or do
I need to
manually
clear some
residues
before
installing an
older version?</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
</div>
<div
class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">Thanks.</div>
</div>
</div>
</div>
<span>
______________________________<wbr>_________________<br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
Gluster-users
mailing list<br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
<a
moz-do-not-send="true"
class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org"><a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a></a><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
<a
moz-do-not-send="true"
class="moz-txt-link-freetext" href="http://www.gluster.org/mailman"><a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman">http://www.gluster.org/mailman</a></a><wbr>/listinfo/gluster-users</span></blockquote>
</div>
<span
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209HOEnZb"><font
color="#888888">
<div dir="ltr">--
<br>
</div>
<div
data-smartmail="gmail_signature">-
Atin (atinm)</div>
</font></span></blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset
class="m_4766802258719003127mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
Gluster-users mailing list
<a moz-do-not-send="true" class="m_4766802258719003127moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" class="m_4766802258719003127moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/<wbr>mailman/listinfo/gluster-users</a></pre>
</blockquote>
</blockquote>
<p>
</p>
</blockquote>
</blockquote>
<p>
</p>
</blockquote>
</blockquote>
<p>
</p>
</div></div></div>
</blockquote></div>
--
<div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">
</div><div>~ Atin (atinm)
</div></div></div></div>
</div></div>
</blockquote>
</blockquote>
</body></html>