<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <br>
    <div class="moz-cite-prefix">On 07/22/2015 01:36 PM, Geoffrey
      Letessier wrote:<br>
    </div>
    <blockquote cite="mid:86AE20A1-2A4D-4C81-B916-BEF664E710E3@cnrs.fr"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      Oops, i forgot to add all people in CC.
      <div><br>
      </div>
      <div>Yes, i guessed. 
        <div><br>
        </div>
        <div>With TCP protocol, all my volume seem OK and I dont note,
          for the moment, any hang. <br>
        </div>
      </div>
    </blockquote>
    <br>
    So if I understand correctly , everything is fine with tcp (no hang,
    no transport end point disconnected error),and both happens for
    rdma. please correct me if not so.<br>
    <br>
    <br>
    <blockquote cite="mid:86AE20A1-2A4D-4C81-B916-BEF664E710E3@cnrs.fr"
      type="cite">
      <div>
        <div><br>
        </div>
        <div>mount command:</div>
        <div><span class="Apple-tab-span" style="white-space: pre;"> </span>-
          with RDMA: <span style="background-color: rgb(0, 0, 0); color:
            rgb(255, 255, 255); font-family: Menlo; font-size: 11px;">mount
            -t glusterfs -o
            transport=rdma,direct-io-mode=disable,enable-ino32
            ib-storage1:vol_home /mnt</span></div>
        <div>
          <div apple-content-edited="true"><span class="Apple-tab-span"
              style="white-space: pre;"> </span>- with TCP:    <span
              style="background-color: rgb(0, 0, 0); color: rgb(255,
              255, 255); font-family: Menlo; font-size: 11px;">mount -t
              glusterfs -o
              transport=tcp,direct-io-mode=disable,enable-ino32
              ib-storage1:vol_home /mnt</span></div>
          <div apple-content-edited="true"><br>
          </div>
          <div apple-content-edited="true">volume status:</div>
          <div apple-content-edited="true">
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;"># gluster volume status all</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Status of volume: vol_home</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Gluster process                
                            TCP Port  RDMA Port  Online  Pid</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage1:/export/brick_home/brick1</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">/data                          
                            49159     49165      Y       6547 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage2:/export/brick_home/brick1</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">/data                          
                            49161     49173      Y       24348</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage3:/export/brick_home/brick1</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">/data                          
                            49152     49156      Y       5616 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage4:/export/brick_home/brick1</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">/data                          
                            49152     49162      Y       5424 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage1:/export/brick_home/brick2</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">/data                          
                            49160     49166      Y       6548 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage2:/export/brick_home/brick2</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">/data                          
                            49162     49174      Y       24355</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage3:/export/brick_home/brick2</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">/data                          
                            49153     49157      Y       5635 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage4:/export/brick_home/brick2</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">/data                          
                            49153     49163      Y       5443 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Self-heal Daemon on localhost  
                            N/A       N/A        Y       6534 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Self-heal Daemon on ib-storage3
                            N/A       N/A        Y       7656 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Self-heal Daemon on ib-storage2
                            N/A       N/A        Y       24519</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Self-heal Daemon on ib-storage4
                            N/A       N/A        Y       7288 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0); min-height:
              13px;"><span style="font-size: 9px;"> <br
                  class="webkit-block-placeholder">
              </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Task Status of Volume vol_home</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">There are no active volume tasks</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0); min-height:
              13px;"><span style="font-size: 9px;"> <br
                  class="webkit-block-placeholder">
              </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Status of volume: vol_shared</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Gluster process                
                            TCP Port  RDMA Port  Online  Pid</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage1:/export/brick_shared/data 49152     49164   
                  Y       6554 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage2:/export/brick_shared/data 49152     49172   
                  Y       24362</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Self-heal Daemon on localhost  
                            N/A       N/A        Y       6534 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Self-heal Daemon on ib-storage3
                            N/A       N/A        Y       7656 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Self-heal Daemon on ib-storage2
                            N/A       N/A        Y       24519</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Self-heal Daemon on ib-storage4
                            N/A       N/A        Y       7288 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0); min-height:
              13px;"><span style="font-size: 9px;"> <br
                  class="webkit-block-placeholder">
              </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Task Status of Volume vol_shared</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">There are no active volume tasks</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0); min-height:
              13px;"><span style="font-size: 9px;"> <br
                  class="webkit-block-placeholder">
              </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Status of volume:
                vol_workdir_amd</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Gluster process                
                            TCP Port  RDMA Port  Online  Pid</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage1:/export/brick_workdir/bri</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">ck1/data                       
                            49191     49192      Y       6555 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage3:/export/brick_workdir/bri</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">ck1/data                       
                            49164     49165      Y       6368 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage1:/export/brick_workdir/bri</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">ck2/data                       
                            49193     49194      Y       6576 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage3:/export/brick_workdir/bri</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">ck2/data                       
                            49166     49167      Y       6387 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0); min-height:
              13px;"><span style="font-size: 9px;"> <br
                  class="webkit-block-placeholder">
              </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Task Status of Volume
                vol_workdir_amd</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">There are no active volume tasks</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0); min-height:
              13px;"><span style="font-size: 9px;"> <br
                  class="webkit-block-placeholder">
              </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Status of volume:
                vol_workdir_intel</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Gluster process                
                            TCP Port  RDMA Port  Online  Pid</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage2:/export/brick_workdir/bri</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">ck1/data                       
                            49175     49176      Y       24371</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage2:/export/brick_workdir/bri</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">ck2/data                       
                            49177     49178      Y       24372</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage4:/export/brick_workdir/bri</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">ck1/data                       
                            49164     49165      Y       5571 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Brick
                ib-storage4:/export/brick_workdir/bri</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">ck2/data                       
                            49166     49167      Y       5590 </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0); min-height:
              13px;"><span style="font-size: 9px;"> <br
                  class="webkit-block-placeholder">
              </span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">Task Status of Volume
                vol_workdir_intel</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
            <div style="margin: 0px; font-family: Menlo; color: rgb(255,
              255, 255); background-color: rgb(0, 0, 0);"><span
                style="font-size: 9px;">There are no active volume tasks</span></div>
            <div><br>
            </div>
          </div>
          <div apple-content-edited="true">Concerning the brick logs, do
            you wanna have all bricks on every servers?</div>
        </div>
      </div>
    </blockquote>
    any errors from client log and bricks logs, and logs which has
    message id in between 102000 to 104000 from the same .<br>
    <br>
    Rafi KC<br>
    <br>
    <blockquote cite="mid:86AE20A1-2A4D-4C81-B916-BEF664E710E3@cnrs.fr"
      type="cite">
      <div>
        <div>
          <div apple-content-edited="true"><br>
          </div>
          <div apple-content-edited="true">Geoffrey</div>
        </div>
        <div><br>
        </div>
        <div>
          <div apple-content-edited="true">
            ------------------------------------------------------<br>
            Geoffrey Letessier<br>
            Responsable informatique &amp; ingénieur système<br>
            UPR 9080 - CNRS - Laboratoire de Biochimie Théorique<br>
            Institut de Biologie Physico-Chimique<br>
            13, rue Pierre et Marie Curie - 75005 Paris<br>
            Tel: 01 58 41 50 93 - eMail: <a moz-do-not-send="true"
              href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a>
          </div>
          <br>
          <div>
            <div>Le 22 juil. 2015 à 10:00, Mohammed Rafi K C &lt;<a
                moz-do-not-send="true" href="mailto:rkavunga@redhat.com">rkavunga@redhat.com</a>&gt;
              a écrit :</div>
            <br class="Apple-interchange-newline">
            <blockquote type="cite">
              <meta content="text/html; charset=windows-1252"
                http-equiv="Content-Type">
              <div bgcolor="#FFFFFF" text="#000000"> <br>
                <br>
                <div class="moz-cite-prefix">On 07/22/2015 12:55 PM,
                  Geoffrey Letessier wrote:<br>
                </div>
                <blockquote
                  cite="mid:2AB9D908-4584-4A49-AC01-92EB04FE1CF3@cnrs.fr"
                  type="cite">
                  <meta http-equiv="Content-Type" content="text/html;
                    charset=windows-1252">
                  Concerning the hang, I just saw this only once with
                  TCP protocol but, actually, RDMA seems to be in cause.</blockquote>
                <br>
                If you are mounting a tcp,rdma volume using tcp
                protocol, all the communication will go through the tcp
                connection and rdma won't come in between client and
                server.<br>
                <br>
                <blockquote
                  cite="mid:2AB9D908-4584-4A49-AC01-92EB04FE1CF3@cnrs.fr"
                  type="cite">… And, after a moment (a few minutes after
                  having restarted my back-transfert of around 40TB), my
                  volume fall down (and all my rsync too):
                  <div>
                    <div style="margin: 0px; font-size: 11px;
                      font-family: Menlo; color: rgb(255, 255, 255);
                      background-color: rgb(0, 0, 0);">[root@atlas ~]#
                      df -h /mnt</div>
                    <div style="margin: 0px; font-size: 11px;
                      font-family: Menlo; color: rgb(255, 255, 255);
                      background-color: rgb(0, 0, 0);">df: « /mnt »:
                      Noeud final de transport n'est pas connecté</div>
                    <div style="margin: 0px; font-size: 11px;
                      font-family: Menlo; color: rgb(255, 255, 255);
                      background-color: rgb(0, 0, 0);">df: aucun système
                      de fichiers traité</div>
                    <div>aka "transport endpoint is not connected »</div>
                  </div>
                </blockquote>
                <br>
                Can you sent me the following details , if possible, ?<br>
                1) mount command used, 2) volume status 3) Client, brick
                logs <br>
                <br>
                Regards<br>
                Rafi KC<br>
                <br>
                <blockquote
                  cite="mid:2AB9D908-4584-4A49-AC01-92EB04FE1CF3@cnrs.fr"
                  type="cite">
                  <div>
                    <div><br>
                    </div>
                    <div>Geoffrey</div>
                    <div><br>
                    </div>
                    <div><br>
                    </div>
                    <div apple-content-edited="true">
                      ------------------------------------------------------<br>
                      Geoffrey Letessier<br>
                      Responsable informatique &amp; ingénieur système<br>
                      UPR 9080 - CNRS - Laboratoire de Biochimie
                      Théorique<br>
                      Institut de Biologie Physico-Chimique<br>
                      13, rue Pierre et Marie Curie - 75005 Paris<br>
                      Tel: 01 58 41 50 93 - eMail: <a
                        moz-do-not-send="true"
                        href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a>
                    </div>
                    <br>
                    <div>
                      <div>Le 22 juil. 2015 à 09:17, Geoffrey Letessier
                        &lt;<a moz-do-not-send="true"
                          href="mailto:geoffrey.letessier@cnrs.fr">geoffrey.letessier@cnrs.fr</a>&gt;

                        a écrit :</div>
                      <br class="Apple-interchange-newline">
                      <blockquote type="cite">
                        <meta http-equiv="Content-Type"
                          content="text/html; charset=windows-1252">
                        <div style="word-wrap: break-word;
                          -webkit-nbsp-mode: space; -webkit-line-break:
                          after-white-space;">Hi Rafi,
                          <div><br>
                          </div>
                          <div>It’s what I do. But I note particularly
                            this kind of trouble when I mount my volumes
                            manually.</div>
                          <div><br>
                          </div>
                          <div>In addition, when I changed my
                            transport-type from tcp or rdma to tcp,rdma,
                            I have had to restart my volume in order
                            they can took effect. </div>
                          <div><br>
                          </div>
                          <div>I wonder if these trouble are not due to
                            RDMA protocol… because it looks like more
                            stable with TCP one.</div>
                          <div><br>
                          </div>
                          <div>Another idea?</div>
                          <div>Thanks for replying and by advance,</div>
                          <div>Geoffrey</div>
                          <div apple-content-edited="true">
                            ------------------------------------------------------<br>
                            Geoffrey Letessier<br>
                            Responsable informatique &amp;
                            ingénieur système<br>
                            UPR 9080 - CNRS - Laboratoire de Biochimie
                            Théorique<br>
                            Institut de Biologie Physico-Chimique<br>
                            13, rue Pierre et Marie Curie - 75005 Paris<br>
                            Tel: 01 58 41 50 93 - eMail: <a
                              moz-do-not-send="true"
                              href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a>
                          </div>
                          <br>
                          <div>
                            <div>Le 22 juil. 2015 à 07:33, Mohammed Rafi
                              K C &lt;<a moz-do-not-send="true"
                                href="mailto:rkavunga@redhat.com">rkavunga@redhat.com</a>&gt;

                              a écrit :</div>
                            <br class="Apple-interchange-newline">
                            <blockquote type="cite">
                              <meta content="text/html;
                                charset=windows-1252"
                                http-equiv="Content-Type">
                              <div bgcolor="#FFFFFF" text="#000000"> <br>
                                <br>
                                <div class="moz-cite-prefix">On
                                  07/22/2015 04:51 AM, Geoffrey
                                  Letessier wrote:<br>
                                </div>
                                <blockquote
                                  cite="mid:AE01B7C4-7319-4F71-8912-AD5C775F956F@cnrs.fr"
                                  type="cite">
                                  <meta http-equiv="Content-Type"
                                    content="text/html;
                                    charset=windows-1252">
                                  Hi Niels,
                                  <div><br>
                                  </div>
                                  <div>Thanks for replying. </div>
                                  <div><br>
                                  </div>
                                  <div>In fact, after having checked the
                                    log, I've discovered GlusterFS tried
                                    to connect a brick with a TCP (or
                                    RDMA) port allocated to another
                                    volume… (bug?)</div>
                                  <div>For example, here is a extract of
                                    my workdir.log file :</div>
                                  <div>
                                    <div style="margin: 0px; font-size:
                                      11px; font-family: Menlo; color:
                                      rgb(255, 255, 255);
                                      background-color: rgb(0, 0, 0);
                                      position: static; z-index: auto;">[2015-07-21

                                      21:34:01.820188] E
                                      [socket.c:2332:socket_connect_finish]
                                      0-vol_workdir_amd-client-0:
                                      connection to 10.0.4.1:49161
                                      failed (Connexion refusée)</div>
                                    <div style="margin: 0px; font-size:
                                      11px; font-family: Menlo; color:
                                      rgb(255, 255, 255);
                                      background-color: rgb(0, 0, 0);
                                      position: static; z-index: auto;">[2015-07-21

                                      21:34:01.822563] E
                                      [socket.c:2332:socket_connect_finish]
                                      0-vol_workdir_amd-client-2:
                                      connection to 10.0.4.1:49162
                                      failed (Connexion refusée)</div>
                                    <div><br>
                                    </div>
                                    <div>But the 2 ports (49161 and
                                      49162) concerned only my vol_home
                                      volume, not the vol_workdir_amd
                                      one.</div>
                                    <div><br>
                                    </div>
                                    <div>Now, after having restart all
                                      glusterd synchronously (pdsh -w
                                      cl-storage[1-4] service glusterd
                                      restart), all seems to be back
                                      into a normal situation (size,
                                      write permission, etc.)</div>
                                    <div><br>
                                    </div>
                                    <div>But, a few minutes later, i
                                      note a strange thing I notice
                                      since i’ve upgraded my cluster
                                      storage from 3.5.3 to 3.7.2-3:
                                      when I try to mount some volume
                                      (particularly my vol_shared volume
                                      (replicated volume)) my system can
                                      hang… And, because I use it in my
                                      bashrc file for my environment
                                      modules, i need to restart my
                                      node. Idem if I try to do a DF on
                                      my mounted volume (if it doesn’t
                                      hang during the mount).</div>
                                    <div><br>
                                    </div>
                                    <div>With TCP transport-type, the
                                      situation seems to be more
                                      stable..</div>
                                    <div><br>
                                    </div>
                                    <div>In addition: If I restart a
                                      storage node, I can’t use Gluster
                                      CLI (it also hang).</div>
                                    <div><br>
                                    </div>
                                    <div>Do you have an idea?</div>
                                  </div>
                                </blockquote>
                                <br>
                                Are you using bash script to start/mount
                                the volume ? If so, add a sleep after
                                volume start and mount, to allow all the
                                process to start properly. Because RDMA
                                protocol will take some time to init the
                                resources.<br>
                                <br>
                                Regards<br>
                                Rafi KC<br>
                                <br>
                                <br>
                                <br>
                                <blockquote
                                  cite="mid:AE01B7C4-7319-4F71-8912-AD5C775F956F@cnrs.fr"
                                  type="cite">
                                  <div>
                                    <div><br>
                                    </div>
                                    <div>One more time, thanks a lot for
                                      your help,</div>
                                    <div>Geoffrey</div>
                                  </div>
                                  <div><br>
                                  </div>
                                  <div apple-content-edited="true">
                                    ------------------------------------------------------<br>
                                    Geoffrey Letessier<br>
                                    Responsable informatique &amp;
                                    ingénieur système<br>
                                    UPR 9080 - CNRS - Laboratoire
                                    de Biochimie Théorique<br>
                                    Institut de Biologie
                                    Physico-Chimique<br>
                                    13, rue Pierre et Marie Curie -
                                    75005 Paris<br>
                                    Tel: 01 58 41 50 93 - eMail: <a
                                      moz-do-not-send="true"
                                      href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a>
                                  </div>
                                  <br>
                                  <div style="">
                                    <div>Le 21 juil. 2015 à 23:49, Niels
                                      de Vos &lt;<a
                                        moz-do-not-send="true"
                                        href="mailto:ndevos@redhat.com">ndevos@redhat.com</a>&gt;


                                      a écrit :</div>
                                    <br
                                      class="Apple-interchange-newline">
                                    <blockquote type="cite">On Tue, Jul
                                      21, 2015 at 11:20:20PM +0200,
                                      Geoffrey Letessier wrote:<br>
                                      <blockquote type="cite">Hello
                                        Soumya, Hello everybody,<br>
                                        <br>
                                        network.ping-timeout was set to
                                        42 seconds. I set it to 0 but no<br>
                                        difference. The problem was,
                                        after having re-set le
                                        transport-type to<br>
                                        rdma,tcp some brick down after a
                                        few minutes.. Despite of
                                        restarting<br>
                                        volumes, after a few minutes,
                                        some [other/different] bricks
                                        down<br>
                                        again.<br>
                                      </blockquote>
                                      <br>
                                      I'm not sure how if the
                                      ping-timeout is differently
                                      handled when RDMA is<br>
                                      used. Adding two of the guys that
                                      know RDMA well on CC.<br>
                                      <br>
                                      <blockquote type="cite">Now, after
                                        re-creation of my volume, bricks
                                        keep alive but, oddly, i’m<br>
                                        not able to write on my volume.
                                        In addition, I defined a
                                        distributed<br>
                                        volume with 2 servers, 4 bricks
                                        of 250GB each and my final
                                        volume<br>
                                        seems to be only sized to 500GB…
                                        It’s amazing.. <br>
                                      </blockquote>
                                      <br>
                                      As seen further below, the 500GB
                                      volume is caused by two
                                      unreachable<br>
                                      bricks. When the bricks are not
                                      reachable, the size of the bricks
                                      can<br>
                                      not be detected by the client and
                                      therefore 2x 250 GB is missing.<br>
                                      <br>
                                      It is unclear to me why writing to
                                      a pure distributed volume fails.
                                      When<br>
                                      a brick is not reachable, and the
                                      file should be created there, it<br>
                                      would normally get created on an
                                      other brick. When the brick that
                                      should<br>
                                      have the file gets online, and a
                                      new lookup for the file is done, a
                                      so<br>
                                      called "link file" is created,
                                      which points to the file on the
                                      other<br>
                                      brick. I guess the failure has to
                                      do with the connection issues, and
                                      I<br>
                                      would suggest to get that solved
                                      first.<br>
                                      <br>
                                      HTH,<br>
                                      Niels<br>
                                      <br>
                                      <br>
                                      <blockquote type="cite">Here you
                                        can find some information:<br>
                                        # gluster volume status
                                        vol_workdir_amd<br>
                                        Status of volume:
                                        vol_workdir_amd<br>
                                        Gluster process
                                                                    TCP
                                        Port  RDMA Port  Online  Pid<br>
------------------------------------------------------------------------------<br>
                                        Brick
                                        ib-storage1:/export/brick_workdir/bri<br>
                                        ck1/data
                                                                           49185
                                            49186      Y       23098<br>
                                        Brick
                                        ib-storage3:/export/brick_workdir/bri<br>
                                        ck1/data
                                                                           49158
                                            49159      Y       3886 <br>
                                        Brick
                                        ib-storage1:/export/brick_workdir/bri<br>
                                        ck2/data
                                                                           49187
                                            49188      Y       23117<br>
                                        Brick
                                        ib-storage3:/export/brick_workdir/bri<br>
                                        ck2/data
                                                                           49160
                                            49161      Y       3905 <br>
                                        <br>
                                        # gluster volume info
                                        vol_workdir_amd<br>
                                        <br>
                                        Volume Name: vol_workdir_amd<br>
                                        Type: Distribute<br>
                                        Volume ID:
                                        087d26ea-c6df-4cbe-94af-ecd87b59aedb<br>
                                        Status: Started<br>
                                        Number of Bricks: 4<br>
                                        Transport-type: tcp,rdma<br>
                                        Bricks:<br>
                                        Brick1:
                                        ib-storage1:/export/brick_workdir/brick1/data<br>
                                        Brick2:
                                        ib-storage3:/export/brick_workdir/brick1/data<br>
                                        Brick3:
                                        ib-storage1:/export/brick_workdir/brick2/data<br>
                                        Brick4:
                                        ib-storage3:/export/brick_workdir/brick2/data<br>
                                        Options Reconfigured:<br>
                                        performance.readdir-ahead: on<br>
                                        <br>
                                        # pdsh -w storage[1,3] df -h
                                        /export/brick_workdir/brick{1,2}<br>
                                        storage3: Filesystem
                                                   Size  Used Avail Use%
                                        Mounted on<br>
                                        storage3:
                                        /dev/mapper/st--block1-blk1--workdir<br>
                                        storage3:
                                                              250G   34M
                                         250G   1%
                                        /export/brick_workdir/brick1<br>
                                        storage3:
                                        /dev/mapper/st--block2-blk2--workdir<br>
                                        storage3:
                                                              250G   34M
                                         250G   1%
                                        /export/brick_workdir/brick2<br>
                                        storage1: Filesystem
                                                   Size  Used Avail Use%
                                        Mounted on<br>
                                        storage1:
                                        /dev/mapper/st--block1-blk1--workdir<br>
                                        storage1:
                                                              250G   33M
                                         250G   1%
                                        /export/brick_workdir/brick1<br>
                                        storage1:
                                        /dev/mapper/st--block2-blk2--workdir<br>
                                        storage1:
                                                              250G   33M
                                         250G   1%
                                        /export/brick_workdir/brick2<br>
                                        <br>
                                        # df -h /workdir/<br>
                                        Filesystem            Size  Used
                                        Avail Use% Mounted on<br>
                                        localhost:vol_workdir_amd.rdma<br>
                                                             500G   67M
                                         500G   1% /workdir<br>
                                        <br>
                                        # touch /workdir/test<br>
                                        touch: impossible de faire un
                                        touch « /workdir/test »: Aucun
                                        fichier ou dossier de ce type<br>
                                        <br>
                                        # tail -30l
                                        /var/log/glusterfs/workdir.log <br>
                                        Host Unreachable, Check your
                                        connection with IPoIB<br>
                                        [2015-07-21 21:10:33.927673] W
                                        [rdma.c:1263:gf_rdma_cm_event_handler]
                                        0-vol_workdir_amd-client-2: cma
                                        event RDMA_CM_EVENT_REJECTED,
                                        error 8 (me:10.0.4.1:1020
                                        peer:10.0.4.1:49174)<br>
                                        Host Unreachable, Check your
                                        connection with IPoIB<br>
                                        [2015-07-21 21:10:37.877231] I
                                        [rpc-clnt.c:1819:rpc_clnt_reconfig]
                                        0-vol_workdir_amd-client-0:
                                        changing port to 49173 (from 0)<br>
                                        [2015-07-21 21:10:37.880556] I
                                        [rpc-clnt.c:1819:rpc_clnt_reconfig]
                                        0-vol_workdir_amd-client-2:
                                        changing port to 49174 (from 0)<br>
                                        [2015-07-21 21:10:37.914661] W
                                        [rdma.c:1263:gf_rdma_cm_event_handler]
                                        0-vol_workdir_amd-client-0: cma
                                        event RDMA_CM_EVENT_REJECTED,
                                        error 8 (me:10.0.4.1:1021
                                        peer:10.0.4.1:49173)<br>
                                        Host Unreachable, Check your
                                        connection with IPoIB<br>
                                        [2015-07-21 21:10:37.923535] W
                                        [rdma.c:1263:gf_rdma_cm_event_handler]
                                        0-vol_workdir_amd-client-2: cma
                                        event RDMA_CM_EVENT_REJECTED,
                                        error 8 (me:10.0.4.1:1020
                                        peer:10.0.4.1:49174)<br>
                                        Host Unreachable, Check your
                                        connection with IPoIB<br>
                                        [2015-07-21 21:10:41.883925] I
                                        [rpc-clnt.c:1819:rpc_clnt_reconfig]
                                        0-vol_workdir_amd-client-0:
                                        changing port to 49173 (from 0)<br>
                                        [2015-07-21 21:10:41.887085] I
                                        [rpc-clnt.c:1819:rpc_clnt_reconfig]
                                        0-vol_workdir_amd-client-2:
                                        changing port to 49174 (from 0)<br>
                                        [2015-07-21 21:10:41.919394] W
                                        [rdma.c:1263:gf_rdma_cm_event_handler]
                                        0-vol_workdir_amd-client-0: cma
                                        event RDMA_CM_EVENT_REJECTED,
                                        error 8 (me:10.0.4.1:1021
                                        peer:10.0.4.1:49173)<br>
                                        Host Unreachable, Check your
                                        connection with IPoIB<br>
                                        [2015-07-21 21:10:41.932622] W
                                        [rdma.c:1263:gf_rdma_cm_event_handler]
                                        0-vol_workdir_amd-client-2: cma
                                        event RDMA_CM_EVENT_REJECTED,
                                        error 8 (me:10.0.4.1:1020
                                        peer:10.0.4.1:49174)<br>
                                        Host Unreachable, Check your
                                        connection with IPoIB<br>
                                        [2015-07-21 21:10:44.682636] W
                                        [dht-layout.c:189:dht_layout_search]
                                        0-vol_workdir_amd-dht: no
                                        subvolume for hash (value) =
                                        1072520554<br>
                                        [2015-07-21 21:10:44.682947] W
                                        [dht-layout.c:189:dht_layout_search]
                                        0-vol_workdir_amd-dht: no
                                        subvolume for hash (value) =
                                        1072520554<br>
                                        [2015-07-21 21:10:44.683240] W
                                        [dht-layout.c:189:dht_layout_search]
                                        0-vol_workdir_amd-dht: no
                                        subvolume for hash (value) =
                                        1072520554<br>
                                        [2015-07-21 21:10:44.683472] W
                                        [dht-diskusage.c:48:dht_du_info_cbk]
                                        0-vol_workdir_amd-dht: failed to
                                        get disk info from
                                        vol_workdir_amd-client-0<br>
                                        [2015-07-21 21:10:44.683506] W
                                        [dht-diskusage.c:48:dht_du_info_cbk]
                                        0-vol_workdir_amd-dht: failed to
                                        get disk info from
                                        vol_workdir_amd-client-2<br>
                                        [2015-07-21 21:10:44.683532] W
                                        [dht-layout.c:189:dht_layout_search]
                                        0-vol_workdir_amd-dht: no
                                        subvolume for hash (value) =
                                        1072520554<br>
                                        [2015-07-21 21:10:44.683551] W
                                        [fuse-bridge.c:1970:fuse_create_cbk]
                                        0-glusterfs-fuse: 18: /test
                                        =&gt; -1 (Aucun fichier ou
                                        dossier de ce type)<br>
                                        [2015-07-21 21:10:44.683619] W
                                        [dht-layout.c:189:dht_layout_search]
                                        0-vol_workdir_amd-dht: no
                                        subvolume for hash (value) =
                                        1072520554<br>
                                        [2015-07-21 21:10:44.683846] W
                                        [dht-layout.c:189:dht_layout_search]
                                        0-vol_workdir_amd-dht: no
                                        subvolume for hash (value) =
                                        1072520554<br>
                                        [2015-07-21 21:10:45.886807] I
                                        [rpc-clnt.c:1819:rpc_clnt_reconfig]
                                        0-vol_workdir_amd-client-0:
                                        changing port to 49173 (from 0)<br>
                                        [2015-07-21 21:10:45.893059] I
                                        [rpc-clnt.c:1819:rpc_clnt_reconfig]
                                        0-vol_workdir_amd-client-2:
                                        changing port to 49174 (from 0)<br>
                                        [2015-07-21 21:10:45.920434] W
                                        [rdma.c:1263:gf_rdma_cm_event_handler]
                                        0-vol_workdir_amd-client-0: cma
                                        event RDMA_CM_EVENT_REJECTED,
                                        error 8 (me:10.0.4.1:1021
                                        peer:10.0.4.1:49173)<br>
                                        Host Unreachable, Check your
                                        connection with IPoIB<br>
                                        [2015-07-21 21:10:45.925292] W
                                        [rdma.c:1263:gf_rdma_cm_event_handler]
                                        0-vol_workdir_amd-client-2: cma
                                        event RDMA_CM_EVENT_REJECTED,
                                        error 8 (me:10.0.4.1:1020
                                        peer:10.0.4.1:49174)<br>
                                        Host Unreachable, Check your
                                        connection with IPoIB<br>
                                        <br>
                                        I use GlusterFS in production
                                        since around 3 years without any
                                        block<br>
                                        problem but now the situation is
                                        awesome since more than 3 weeks…<br>
                                        Indeed, our production are down
                                        since roughly 3.5 weeks (with a
                                        lot<br>
                                        and different problems with
                                        GlusterFS v3.5.3 and now with
                                        3.7.2-3) and<br>
                                        i need to restart it… <br>
                                        <br>
                                        Thanks in advance,<br>
                                        Geoffrey<br>
------------------------------------------------------<br>
                                        Geoffrey Letessier<br>
                                        Responsable informatique &amp;
                                        ingénieur système<br>
                                        UPR 9080 - CNRS - Laboratoire de
                                        Biochimie Théorique<br>
                                        Institut de Biologie
                                        Physico-Chimique<br>
                                        13, rue Pierre et Marie Curie -
                                        75005 Paris<br>
                                        Tel: 01 58 41 50 93 - eMail: <a
                                          moz-do-not-send="true"
                                          href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a><br>
                                        <br>
                                        Le 21 juil. 2015 à 19:36, Soumya
                                        Koduri &lt;<a
                                          moz-do-not-send="true"
                                          href="mailto:skoduri@redhat.com">skoduri@redhat.com</a>&gt;


                                        a écrit :<br>
                                        <br>
                                        <blockquote type="cite">From the
                                          following errors,<br>
                                          <br>
                                          [2015-07-21 14:36:30.495321] I
                                          [MSGID: 114020]
                                          [client.c:2118:notify]
                                          0-vol_shared-client-0: parent
                                          translators are ready,
                                          attempting connect on
                                          transport<br>
                                          [2015-07-21 14:36:30.498989] W
                                          [socket.c:923:__socket_keepalive]

                                          0-socket: failed to set
                                          TCP_USER_TIMEOUT 0 on socket
                                          12, Protocole non disponible<br>
                                          [2015-07-21 14:36:30.499004] E
                                          [socket.c:3015:socket_connect]
                                          0-vol_shared-client-0: Failed
                                          to set keep-alive: Protocole
                                          non disponible<br>
                                          <br>
                                          looks like setting
                                          TCP_USER_TIMEOUT value to 0 on
                                          the socket failed with error
                                          (IIUC) "Protocol not
                                          available".<br>
                                          Could you check if
                                          'network.ping-timeout' is set
                                          to zero for that volume using
                                          'gluster volume info'? Anyways
                                          from the code looks like
                                          'TCP_USER_TIMEOUT' can take
                                          value zero. Not sure why it
                                          has failed.<br>
                                          <br>
                                          Niels, any thoughts?<br>
                                          <br>
                                          Thanks,<br>
                                          Soumya<br>
                                          <br>
                                          On 07/21/2015 08:15 PM,
                                          Geoffrey Letessier wrote:<br>
                                          <blockquote type="cite">[2015-07-21

                                            14:36:30.495321] I [MSGID:
                                            114020]
                                            [client.c:2118:notify]<br>
                                            0-vol_shared-client-0:
                                            parent translators are
                                            ready, attempting connect<br>
                                            on transport<br>
                                            [2015-07-21 14:36:30.498989]
                                            W
                                            [socket.c:923:__socket_keepalive]<br>
                                            0-socket: failed to set
                                            TCP_USER_TIMEOUT 0 on socket
                                            12, Protocole non<br>
                                            disponible<br>
                                            [2015-07-21 14:36:30.499004]
                                            E
                                            [socket.c:3015:socket_connect]<br>
                                            0-vol_shared-client-0:
                                            Failed to set keep-alive:
                                            Protocole non disponible<br>
                                          </blockquote>
                                        </blockquote>
                                        <br>
                                      </blockquote>
                                    </blockquote>
                                  </div>
                                  <br>
                                </blockquote>
                                <br>
                              </div>
                            </blockquote>
                          </div>
                          <br>
                        </div>
                      </blockquote>
                    </div>
                    <br>
                  </div>
                </blockquote>
                <br>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>