<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>Hi Micha,</p>
    <p>Can you please also see if there is any error messages in dmesg ?
      Basically I'm trying to see whether your hitting issues described
      in <a class="moz-txt-link-freetext" href="https://bugzilla.kernel.org/show_bug.cgi?id=73831">https://bugzilla.kernel.org/show_bug.cgi?id=73831</a> .</p>
    <p><br>
    </p>
    <p>Regards</p>
    <p>Rafi KC</p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 12/19/2016 11:58 AM, Mohammed Rafi K
      C wrote:<br>
    </div>
    <blockquote
      cite="mid:86231d60-3363-0e68-48d3-818cd73c62e9@redhat.com"
      type="cite">
      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
      <p>Hi Micha,</p>
      <p>Sorry for the late reply. I was busy with some other things.</p>
      <p>If you have still the setup available Can you enable TRACE log
        level [1],[2] and see if you could find any log entries when the
        network start disconnecting. Basically I'm trying to find out
        any disconnection had occurred other than ping timer expire
        issue.</p>
      <p><br>
      </p>
      <p><br>
      </p>
      <p>[1] : gluster volume &lt;volname&gt;
        diagnostics.brick-log-level TRACE</p>
      <p>[2] : gluster volume &lt;volname&gt;
        diagnostics.client-log-level TRACE<br>
      </p>
      <p><br>
      </p>
      <p>Regards</p>
      <p>Rafi KC<br>
      </p>
      <br>
      <div class="moz-cite-prefix">On 12/08/2016 07:59 PM, Atin
        Mukherjee wrote:<br>
      </div>
      <blockquote
cite="mid:CAGNCGH3Rjy8B7wz+gTQqc35FLpQ4gn9u+bMaDRM0hkaGitUaGw@mail.gmail.com"
        type="cite">
        <div dir="ltr"><br>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On Thu, Dec 8, 2016 at 4:37 PM,
              Micha Ober <span dir="ltr">&lt;<a moz-do-not-send="true"
                  href="mailto:micha2k@gmail.com" target="_blank">micha2k@gmail.com</a>&gt;</span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div bgcolor="#FFFFFF" text="#000000">
                  <div class="m_4766802258719003127moz-cite-prefix">Hi
                    Rafi,<br>
                    <br>
                    thank you for your support. It is greatly
                    appreciated.<br>
                    <br>
                    Just some more thoughts from my side:<br>
                    <br>
                    There have been no reports from other  users in
                    *this* thread until now, but I have found at least
                    one user with a very simiar problem in an older
                    thread:<br>
                    <br>
                    <a moz-do-not-send="true"
                      class="m_4766802258719003127moz-txt-link-freetext"
href="https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html"
                      target="_blank">https://www.gluster.org/<wbr>pipermail/gluster-users/2014-<wbr>November/019637.html</a><br>
                    <br>
                    He is also reporting disconnects  with no apparent
                    reasons, althogh his setup is a bit more
                    complicated, also involving a firewall. In our
                    setup, all servers/clients are connected via 1 GbE
                    with no firewall or anything that might
                    block/throttle traffic. Also, we are using exactly
                    the same software versions on all nodes.<br>
                    <br>
                    <br>
                    I can also find some reports in the bugtracker when
                    searching for "rpc_client_ping_timer_<wbr>expired"
                    and "rpc_clnt_ping_timer_expired" (looks like
                    spelling changed during versions).<br>
                    <br>
                    <a moz-do-not-send="true"
                      class="m_4766802258719003127moz-txt-link-freetext"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1096729"
                      target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1096729</a></div>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>Just FYI, this is a different issue, here GlusterD
                fails to handle the volume of incoming requests on time
                since MT-epoll is not enabled here.<br>
                 <br>
              </div>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div bgcolor="#FFFFFF" text="#000000">
                  <div class="m_4766802258719003127moz-cite-prefix"><br>
                    <a moz-do-not-send="true"
                      class="m_4766802258719003127moz-txt-link-freetext"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1370683"
                      target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1370683</a><br>
                    <br>
                    But both reports involve large traffic/load on the
                    bricks/disks, which is not the case for out setup.<br>
                    To give a ballpark figure: Over three days, 30 GiB
                    were written. And the data was not written at once,
                    but continuously over the whole time.<br>
                    <br>
                    <br>
                    Just to be sure, I have checked the logfiles of one
                    of the other clusters right now, which are sitting
                    in the same building, in the same rack, even on the
                    same switch, running the same jobs, but with
                    glusterfs 3.4.2 and I can see no disconnects in the
                    logfiles. So I can definitely rule out our
                    infrastructure as problem.<br>
                    <br>
                    Regards,<br>
                    Micha
                    <div>
                      <div class="h5"><br>
                        <br>
                        <br>
                        Am 07.12.2016 um 18:08 schrieb Mohammed Rafi K
                        C:<br>
                      </div>
                    </div>
                  </div>
                  <div>
                    <div class="h5">
                      <blockquote type="cite">
                        <p>Hi Micha,</p>
                        <p>This is great. I will provide you one debug
                          build which has two fixes which I possible
                          suspect for a frequent disconnect issue,
                          though I don't have much data to validate my
                          theory. So I will take one more day to dig in
                          to that.</p>
                        <p>Thanks for your support, and opensource++  </p>
                        <p>Regards</p>
                        <p>Rafi KC<br>
                        </p>
                        <div
                          class="m_4766802258719003127moz-cite-prefix">On
                          12/07/2016 05:02 AM, Micha Ober wrote:<br>
                        </div>
                        <blockquote type="cite">
                          <div
                            class="m_4766802258719003127moz-cite-prefix">Hi,<br>
                            <br>
                            thank you for your answer and even more for
                            the question!<br>
                            Until now, I was using FUSE. Today I changed
                            all mounts to NFS using the same 3.7.17
                            version.<br>
                            <br>
                            But: The problem is still the same. Now, the
                            NFS logfile contains lines like these:<br>
                            <br>
                            [2016-12-06 15:12:29.006325] C
                            [rpc-clnt-ping.c:165:rpc_clnt_<wbr>ping_timer_expired]
                            0-gv0-client-7: server X.X.18.62:49153 has
                            not responded in the last 42 seconds,
                            disconnecting.<br>
                            <br>
                            Interestingly enough,  the IP address
                            X.X.18.62 is the same machine! As I wrote
                            earlier, each node serves both as a server
                            and a client, as each node contributes
                            bricks to the volume. Every server is
                            connecting to itself via its hostname. For
                            example, the fstab on the node "giant2"
                            looks like:<br>
                            <br>
                            #giant2:/gv0    /shared_data   
                            glusterfs       defaults,noauto 0       0<br>
                            #giant2:/gv2    /shared_slurm  
                            glusterfs       defaults,noauto 0       0<br>
                            <br>
                            giant2:/gv0     /shared_data   
                            nfs             defaults,_netdev,vers=3
                            0       0<br>
                            giant2:/gv2     /shared_slurm  
                            nfs             defaults,_netdev,vers=3
                            0       0<br>
                            <br>
                            So I understand the disconnects even less. <br>
                            <br>
                            I don't know if it's possible to create a
                            dummy cluster which exposes the same
                            behaviour, because the disconnects only
                            happen when there are compute jobs running
                            on those nodes - and they are GPU compute
                            jobs, so that's something which cannot be
                            easily emulated in a VM.<br>
                            <br>
                            As we have more clusters (which are running
                            fine with an ancient 3.4 version :-)) and we
                            are currently not dependent on this
                            particular cluster (which may stay like this
                            for this month, I think) I should be able to
                            deploy the debug build on the "real"
                            cluster, if you can provide a debug build.<br>
                            <br>
                            Regards and thanks,<br>
                            Micha<br>
                            <br>
                            <br>
                            <br>
                            Am 06.12.2016 um 08:15 schrieb Mohammed Rafi
                            K C:<br>
                          </div>
                          <blockquote type="cite">
                            <p><br>
                            </p>
                            <br>
                            <div
                              class="m_4766802258719003127moz-cite-prefix">On
                              12/03/2016 12:56 AM, Micha Ober wrote:<br>
                            </div>
                            <blockquote type="cite">
                              <div
                                class="m_4766802258719003127moz-cite-prefix"><tt>**
                                  Update: ** I have downgraded from
                                  3.8.6 to 3.7.17 now, but the problem
                                  still exists.</tt><tt><br>
                                </tt></div>
                            </blockquote>
                            <blockquote type="cite">
                              <div
                                class="m_4766802258719003127moz-cite-prefix"><tt>
                                </tt><tt><br>
                                </tt><tt>Client log: <a
                                    moz-do-not-send="true"
                                    class="moz-txt-link-freetext"
                                    href="http://paste.ubuntu.com/"><a class="moz-txt-link-freetext" href="http://paste.ubuntu.com/">http://paste.ubuntu.com/</a></a><wbr>23569065/</tt><tt><br>
                                </tt><tt>Brick log: <a
                                    moz-do-not-send="true"
                                    class="moz-txt-link-freetext"
                                    href="http://paste.ubuntu.com/"><a class="moz-txt-link-freetext" href="http://paste.ubuntu.com/">http://paste.ubuntu.com/</a></a><wbr>23569067/</tt><tt><br>
                                </tt><tt><br>
                                </tt><tt>Please note that each server
                                  has two bricks.</tt><tt><br>
                                </tt><tt>Whereas, according to the logs,
                                  one brick loses the connection to all
                                  other hosts:</tt><tt><br>
                                </tt>
                                <pre style="color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">[2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.219:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.62:49118 failed (Broken pipe)
[2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.107:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.206:49120 failed (Broken pipe)
[2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.58:49121 failed (Broken pipe)

The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
As I said, the network connection is fine and the disks are idle.
The CPU always has 2 free cores.

It looks like I have to downgrade to 3.4 now in order for the disconnects to stop.</pre>
                              </div>
                            </blockquote>
                            <br>
                            Hi Micha,<br>
                            <br>
                            Thanks for the update and sorry for what
                            happened with gluster higher versions. I can
                            understand the need for downgrade as it is a
                            production setup.<br>
                            <br>
                            Can you tell me the clients used here ?
                            whether it is a fuse,nfs,nfs-ganesha, smb or
                            libgfapi ?<br>
                            <br>
                            Since I'm not able to reproduce the issue (I
                            have been trying from last 3days) and the
                            logs are not much helpful here (we don't
                            have much logs in socket layer), Could you
                            please create a dummy cluster and try to
                            reproduce the issue? If then we can play
                            with that volume and I could provide some
                            debug build which we can use for further
                            debugging?<br>
                            <br>
                            If you don't have bandwidth for this, please
                            leave it ;).<br>
                            <br>
                            Regards<br>
                            Rafi KC<br>
                            <br>
                            <blockquote type="cite">
                              <div
                                class="m_4766802258719003127moz-cite-prefix">
                                <pre style="color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px">- Micha
</pre>
                                <br>
                                Am 30.11.2016 um 06:57 schrieb Mohammed
                                Rafi K C:<br>
                              </div>
                              <blockquote type="cite">
                                <p>Hi Micha,</p>
                                <p>I have changed the thread and subject
                                  so that your original thread remain
                                  same for your query. Let's try to fix
                                  the problem what you observed with
                                  3.8.4, So I have started a new thread
                                  to discuss the frequent disconnect
                                  problem.</p>
                                <p><b>If any one else has experienced
                                    the same problem, please respond to
                                    the mail.</b><br>
                                </p>
                                <p>It would be very helpful if you could
                                  give us some more logs from clients
                                  and bricks.  Also any reproducible
                                  steps will surely help to chase the
                                  problem further.</p>
                                <p>Regards</p>
                                <p>Rafi KC<br>
                                </p>
                                <div
                                  class="m_4766802258719003127moz-cite-prefix">On
                                  11/30/2016 04:44 AM, Micha Ober wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div>
                                      <div><font face="monospace,
                                          monospace">I had opened
                                          another thread on this mailing
                                          list (Subject: "After upgrade
                                          from 3.4.2 to 3.8.5 - High CPU
                                          usage resulting in disconnects
                                          and split-brain").</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">The title may be a
                                          bit misleading now, as I am no
                                          longer observing high CPU
                                          usage after upgrading to
                                          3.8.6, but the disconnects are
                                          still happening and the number
                                          of files in split-brain is
                                          growing.</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">Setup: 6 compute
                                          nodes, each serving as a
                                          glusterfs server and client,
                                          Ubuntu 14.04, two bricks per
                                          node, distribute-replicate</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">I have two gluster
                                          volumes set up (one for
                                          scratch data, one for the
                                          slurm scheduler). Only the
                                          scratch data volume shows
                                          critical errors "[...] has not
                                          responded in the last 42
                                          seconds, disconnecting.". So I
                                          can rule out network problems,
                                          the gigabit link between the
                                          nodes is not saturated at all.
                                          The disks are almost idle
                                          (&lt;10%).</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">I have glusterfs
                                          3.4.2 on Ubuntu 12.04 on a
                                          another compute cluster,
                                          running fine since it was
                                          deployed.</font></div>
                                      <div><font face="monospace,
                                          monospace">I had glusterfs
                                          3.4.2 on Ubuntu 14.04 on this
                                          cluster, running fine for
                                          almost a year.</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">After upgrading to
                                          3.8.5, the problems (as
                                          described) started. I would
                                          like to use some of the new
                                          features of the newer versions
                                          (like bitrot), but the users
                                          can't run their compute jobs
                                          right now because the result
                                          files are garbled.</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">There also seems to
                                          be a bug report with a smiliar
                                          problem: (but no progress)</font></div>
                                      <div><font face="monospace,
                                          monospace"><a
                                            moz-do-not-send="true"
                                            class="moz-txt-link-freetext"
href="https://bugzilla.redhat.com/"><a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/">https://bugzilla.redhat.com/</a></a><wbr>show_bug.cgi?id=1370683</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">For me, ALL servers
                                          are affected (not isolated to
                                          one or two servers)</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">I also see messages
                                          like <a
                                            moz-do-not-send="true"
                                            class="m_4766802258719003127moz-txt-link-rfc2396E"><a class="moz-txt-link-rfc2396E" href="INFO:taskgpu_graphene_bv:4476blockedformorethan120seconds.">"INFO:
                                            task gpu_graphene_bv:4476
                                            blocked for more than 120
                                            seconds."</a></a> in the syslog.</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">For completeness
                                          (gv0 is the scratch volume,
                                          gv2 the slurm volume):</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">[root@giant2: ~]#
                                          gluster v info</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">Volume Name: gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Type:
                                          Distributed-Replicate</font></div>
                                      <div><font face="monospace,
                                          monospace">Volume ID:
                                          993ec7c9-e4bc-44d0-b7c4-<wbr>2d977e622e86</font></div>
                                      <div><font face="monospace,
                                          monospace">Status: Started</font></div>
                                      <div><font face="monospace,
                                          monospace">Snapshot Count: 0</font></div>
                                      <div><font face="monospace,
                                          monospace">Number of Bricks: 6
                                          x 2 = 12</font></div>
                                      <div><font face="monospace,
                                          monospace">Transport-type: tcp</font></div>
                                      <div><font face="monospace,
                                          monospace">Bricks:</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick1:
                                          giant1:/gluster/sdc/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick2:
                                          giant2:/gluster/sdc/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick3:
                                          giant3:/gluster/sdc/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick4:
                                          giant4:/gluster/sdc/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick5:
                                          giant5:/gluster/sdc/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick6:
                                          giant6:/gluster/sdc/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick7:
                                          giant1:/gluster/sdd/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick8:
                                          giant2:/gluster/sdd/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick9:
                                          giant3:/gluster/sdd/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick10:
                                          giant4:/gluster/sdd/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick11:
                                          giant5:/gluster/sdd/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick12:
                                          giant6:/gluster/sdd/gv0</font></div>
                                      <div><font face="monospace,
                                          monospace">Options
                                          Reconfigured:</font></div>
                                      <div><font face="monospace,
                                          monospace">auth.allow:
                                          X.X.X.*,127.0.0.1</font></div>
                                      <div><font face="monospace,
                                          monospace">nfs.disable: on</font></div>
                                      <div><font face="monospace,
                                          monospace"><br>
                                        </font></div>
                                      <div><font face="monospace,
                                          monospace">Volume Name: gv2</font></div>
                                      <div><font face="monospace,
                                          monospace">Type: Replicate</font></div>
                                      <div><font face="monospace,
                                          monospace">Volume ID:
                                          30c78928-5f2c-4671-becc-<wbr>8deaee1a7a8d</font></div>
                                      <div><font face="monospace,
                                          monospace">Status: Started</font></div>
                                      <div><font face="monospace,
                                          monospace">Snapshot Count: 0</font></div>
                                      <div><font face="monospace,
                                          monospace">Number of Bricks: 1
                                          x 2 = 2</font></div>
                                      <div><font face="monospace,
                                          monospace">Transport-type: tcp</font></div>
                                      <div><font face="monospace,
                                          monospace">Bricks:</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick1:
                                          giant1:/gluster/sdd/gv2</font></div>
                                      <div><font face="monospace,
                                          monospace">Brick2:
                                          giant2:/gluster/sdd/gv2</font></div>
                                      <div><font face="monospace,
                                          monospace">Options
                                          Reconfigured:</font></div>
                                      <div><font face="monospace,
                                          monospace">auth.allow:
                                          X.X.X.*,127.0.0.1</font></div>
                                      <div><font face="monospace,
                                          monospace">cluster.granular-entry-heal:
                                          on</font></div>
                                      <div><font face="monospace,
                                          monospace">cluster.locking-scheme:
                                          granular</font></div>
                                      <div><font face="monospace,
                                          monospace">nfs.disable: on</font></div>
                                      <div
                                        style="font-family:monospace,monospace"><br>
                                      </div>
                                    </div>
                                  </div>
                                  <div class="gmail_extra"><br>
                                    <div class="gmail_quote">2016-11-30
                                      0:10 GMT+01:00 Micha Ober <span
                                        dir="ltr">&lt;<a
                                          moz-do-not-send="true"
                                          class="moz-txt-link-abbreviated"
href="mailto:micha2k@gmail.com"><a class="moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com">micha2k@gmail.com</a></a>&gt;</span>:<br>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex">
                                        <div dir="ltr">
                                          <div
                                            style="font-family:monospace,monospace">There
                                            also seems to be a bug
                                            report with a smiliar
                                            problem: (but no progress)</div>
                                          <div><font face="monospace,
                                              monospace"><a
                                                moz-do-not-send="true"
                                                class="moz-txt-link-freetext"
href="https://bugzilla.redhat.com/sh"><a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/sh">https://bugzilla.redhat.com/sh</a></a><wbr>ow_bug.cgi?id=1370683</font><br>
                                          </div>
                                          <div><font face="monospace,
                                              monospace"><br>
                                            </font></div>
                                          <div><font face="monospace,
                                              monospace">For me, ALL
                                              servers are affected (not
                                              isolated to one or two
                                              servers)</font></div>
                                          <div><font face="monospace,
                                              monospace"><br>
                                            </font></div>
                                          <div><font face="monospace,
                                              monospace">I also see
                                              messages like <a
                                                moz-do-not-send="true"
                                                class="moz-txt-link-rfc2396E"
href="INFO:taskgpu_graphene_bv:4476blockedformorethan120seconds."><a class="moz-txt-link-rfc2396E" href="INFO:taskgpu_graphene_bv:4476blockedformorethan120seconds.">"INFO:
                                                task
                                                gpu_graphene_bv:4476
                                                blocked for more than
                                                120 seconds."</a></a> in the
                                              syslog.</font></div>
                                          <div><font face="monospace,
                                              monospace"><br>
                                            </font></div>
                                          <div><font face="monospace,
                                              monospace">For
                                              completeness (gv0 is the
                                              scratch volume, gv2 the
                                              slurm volume):</font></div>
                                          <div><font face="monospace,
                                              monospace"><br>
                                            </font></div>
                                          <div><font face="monospace,
                                              monospace">
                                              <div>[root@giant2: ~]#
                                                gluster v info</div>
                                              <div><br>
                                              </div>
                                              <div>Volume Name: gv0</div>
                                              <div>Type:
                                                Distributed-Replicate</div>
                                              <div>Volume ID:
                                                993ec7c9-e4bc-44d0-b7c4-2d977e<wbr>622e86</div>
                                              <div>Status: Started</div>
                                              <div>Snapshot Count: 0</div>
                                              <div>Number of Bricks: 6 x
                                                2 = 12</div>
                                              <div>Transport-type: tcp</div>
                                              <div>Bricks:</div>
                                              <div>Brick1:
                                                giant1:/gluster/sdc/gv0</div>
                                              <div>Brick2:
                                                giant2:/gluster/sdc/gv0</div>
                                              <div>Brick3:
                                                giant3:/gluster/sdc/gv0</div>
                                              <div>Brick4:
                                                giant4:/gluster/sdc/gv0</div>
                                              <div>Brick5:
                                                giant5:/gluster/sdc/gv0</div>
                                              <div>Brick6:
                                                giant6:/gluster/sdc/gv0</div>
                                              <div>Brick7:
                                                giant1:/gluster/sdd/gv0</div>
                                              <div>Brick8:
                                                giant2:/gluster/sdd/gv0</div>
                                              <div>Brick9:
                                                giant3:/gluster/sdd/gv0</div>
                                              <div>Brick10:
                                                giant4:/gluster/sdd/gv0</div>
                                              <div>Brick11:
                                                giant5:/gluster/sdd/gv0</div>
                                              <div>Brick12:
                                                giant6:/gluster/sdd/gv0</div>
                                              <div>Options Reconfigured:</div>
                                              <div>auth.allow:
                                                X.X.X.*,127.0.0.1</div>
                                              <div>nfs.disable: on</div>
                                              <div><br>
                                              </div>
                                              <div>Volume Name: gv2</div>
                                              <div>Type: Replicate</div>
                                              <div>Volume ID:
                                                30c78928-5f2c-4671-becc-8deaee<wbr>1a7a8d</div>
                                              <div>Status: Started</div>
                                              <div>Snapshot Count: 0</div>
                                              <div>Number of Bricks: 1 x
                                                2 = 2</div>
                                              <div>Transport-type: tcp</div>
                                              <div>Bricks:</div>
                                              <div>Brick1:
                                                giant1:/gluster/sdd/gv2</div>
                                              <div>Brick2:
                                                giant2:/gluster/sdd/gv2</div>
                                              <div>Options Reconfigured:</div>
                                              <div>auth.allow:
                                                X.X.X.*,127.0.0.1</div>
                                              <div>cluster.granular-entry-heal:
                                                on</div>
                                              <div>cluster.locking-scheme:
                                                granular</div>
                                              <div>nfs.disable: on</div>
                                              <div><br>
                                              </div>
                                            </font></div>
                                        </div>
                                        <div
                                          class="m_4766802258719003127HOEnZb">
                                          <div
                                            class="m_4766802258719003127h5">
                                            <div class="gmail_extra"><br>
                                              <div class="gmail_quote">2016-11-29
                                                19:21 GMT+01:00 Micha
                                                Ober <span dir="ltr">&lt;<a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
                                                    href="mailto:micha2k@gmail.com"><a class="moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com">micha2k@gmail.com</a></a>&gt;</span>:<br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                  <div dir="ltr">
                                                    <div
                                                      style="font-family:monospace,monospace">I
                                                      had opened another
                                                      thread on this
                                                      mailing list
                                                      (Subject: "After
                                                      upgrade from 3.4.2
                                                      to 3.8.5 - High
                                                      CPU usage
                                                      resulting in
                                                      disconnects and
                                                      split-brain").</div>
                                                    <div
                                                      style="font-family:monospace,monospace"><br>
                                                    </div>
                                                    <div
                                                      style="font-family:monospace,monospace">The
                                                      title may be a bit
                                                      misleading now, as
                                                      I am no longer
                                                      observing high CPU
                                                      usage after
                                                      upgrading to
                                                      3.8.6, but the
                                                      disconnects are
                                                      still happening
                                                      and the number of
                                                      files in
                                                      split-brain is
                                                      growing.<br>
                                                    </div>
                                                    <div
                                                      style="font-family:monospace,monospace"><br>
                                                    </div>
                                                    <div
                                                      style="font-family:monospace,monospace">Setup:
                                                      6 compute nodes,
                                                      each serving as a
                                                      glusterfs server
                                                      and client, Ubuntu
                                                      14.04, two bricks
                                                      per node,
                                                      distribute-replicate</div>
                                                    <div
                                                      style="font-family:monospace,monospace"><br>
                                                    </div>
                                                    <div
                                                      style="font-family:monospace,monospace">I
                                                      have two gluster
                                                      volumes set up
                                                      (one for scratch
                                                      data, one for the
                                                      slurm scheduler).
                                                      Only the scratch
                                                      data volume shows
                                                      critical errors
                                                      "[...] has not
                                                      responded in the
                                                      last 42 seconds,
                                                      disconnecting.".
                                                      So I can rule out
                                                      network problems,
                                                      the gigabit link
                                                      between the nodes
                                                      is not saturated
                                                      at all. The disks
                                                      are almost idle
                                                      (&lt;10%).</div>
                                                    <div
                                                      style="font-family:monospace,monospace"><br>
                                                    </div>
                                                    <div
                                                      style="font-family:monospace,monospace">I
                                                      have glusterfs
                                                      3.4.2 on Ubuntu
                                                      12.04 on a another
                                                      compute cluster,
                                                      running fine since
                                                      it was deployed.</div>
                                                    <div
                                                      style="font-family:monospace,monospace">I
                                                      had glusterfs
                                                      3.4.2 on Ubuntu
                                                      14.04 on this
                                                      cluster, running
                                                      fine for almost a
                                                      year.</div>
                                                    <div
                                                      style="font-family:monospace,monospace"><br>
                                                    </div>
                                                    <div
                                                      style="font-family:monospace,monospace">After
                                                      upgrading to
                                                      3.8.5, the
                                                      problems (as
                                                      described)
                                                      started. I would
                                                      like to use some
                                                      of the new
                                                      features of the
                                                      newer versions
                                                      (like bitrot), but
                                                      the users can't
                                                      run their compute
                                                      jobs right now
                                                      because the result
                                                      files are garbled.</div>
                                                  </div>
                                                  <div
                                                    class="m_4766802258719003127m_-1578094958703753071HOEnZb">
                                                    <div
                                                      class="m_4766802258719003127m_-1578094958703753071h5">
                                                      <div
                                                        class="gmail_extra"><br>
                                                        <div
                                                          class="gmail_quote">2016-11-29
                                                          18:53
                                                          GMT+01:00 Atin
                                                          Mukherjee <span
                                                          dir="ltr">&lt;<a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
                                                          href="mailto:amukherj@redhat.com"><a class="moz-txt-link-abbreviated" href="mailto:amukherj@redhat.com">amukherj@redhat.com</a></a>&gt;</span>:<br>
                                                          <blockquote
                                                          class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                          <div style="white-space:pre-wrap">Would you be able to share what is not working for you in 3.8.x (mention the exact version). 3.4 is quite old and falling back to an unsupported version doesn't look a feasible option.</div>
                                                          <br>
                                                          <div
                                                          class="gmail_quote">
                                                          <div>
                                                          <div
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209h5">
                                                          <div dir="ltr">On
                                                          Tue, 29 Nov
                                                          2016 at 17:01,
                                                          Micha Ober
                                                          &lt;<a
                                                          moz-do-not-send="true"
class="moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com"><a class="moz-txt-link-abbreviated" href="mailto:micha2k@gmail.com">micha2k@gmail.com</a></a>&gt;
                                                          wrote:<br>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          <blockquote
                                                          class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                          <div>
                                                          <div
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209h5">
                                                          <div dir="ltr"
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">Hi,</div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          </div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">I was using gluster 3.4 and
                                                          upgraded to
                                                          3.8, but that
                                                          version showed
                                                          to be unusable
                                                          for me. I now
                                                          need to
                                                          downgrade.</div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          </div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">I'm running Ubuntu 14.04. As
                                                          upgrades of
                                                          the op version
are irreversible, I guess I have to delete all gluster volumes and
                                                          re-create them
                                                          with the
                                                          downgraded
                                                          version. </div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          </div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">0. Backup data</div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">1. Unmount all gluster volumes</div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">2. apt-get purge
                                                          glusterfs-server
glusterfs-client</div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">3. Remove PPA for 3.8</div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">4. Add PPA for older version</div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">5. apt-get install
                                                          glusterfs-server
glusterfs-client</div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">6. Create volumes</div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          </div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">Is "purge" enough to delete all
                                                          configuration
                                                          files of the
                                                          currently
                                                          installed
                                                          version or do
                                                          I need to
                                                           manually
                                                          clear some
                                                          residues
                                                          before
                                                          installing an
                                                          older version?</div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace"><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          </div>
                                                          <div
                                                          class="m_4766802258719003127gmail_default
m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg"
style="font-family:monospace,monospace">Thanks.</div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          <span>
                                                          ______________________________<wbr>_________________<br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          Gluster-users
                                                          mailing list<br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          <a
                                                          moz-do-not-send="true"
class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org"><a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a></a><br
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209m_-2705140003504720857gmail_msg">
                                                          <a
                                                          moz-do-not-send="true"
class="moz-txt-link-freetext" href="http://www.gluster.org/mailman"><a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman">http://www.gluster.org/mailman</a></a><wbr>/listinfo/gluster-users</span></blockquote>
                                                          </div>
                                                          <span
class="m_4766802258719003127m_-1578094958703753071m_-2811647508981727209HOEnZb"><font
color="#888888">
                                                          <div dir="ltr">--
                                                          <br>
                                                          </div>
                                                          <div
                                                          data-smartmail="gmail_signature">-
                                                          Atin (atinm)</div>
                                                          </font></span></blockquote>
                                                        </div>
                                                        <br>
                                                      </div>
                                                    </div>
                                                  </div>
                                                </blockquote>
                                              </div>
                                              <br>
                                            </div>
                                          </div>
                                        </div>
                                      </blockquote>
                                    </div>
                                    <br>
                                  </div>
                                  <br>
                                  <fieldset
                                    class="m_4766802258719003127mimeAttachmentHeader"></fieldset>
                                  <br>
                                  <pre>______________________________<wbr>_________________
Gluster-users mailing list
<a moz-do-not-send="true" class="m_4766802258719003127moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" class="m_4766802258719003127moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/<wbr>mailman/listinfo/gluster-users</a></pre>
              </blockquote>
              

            </blockquote>
            <p>

            </p>
          </blockquote>
          

        </blockquote>
        <p>

        </p>
      </blockquote>
      

    </blockquote>
    <p>

    </p>
  </div></div></div>

</blockquote></div>


-- 
<div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">
</div><div>~ Atin (atinm)
</div></div></div></div>
</div></div>



</blockquote>



</blockquote>
</body></html>