<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<br>
<div class="moz-cite-prefix">On 07/22/2015 01:36 PM, Geoffrey
Letessier wrote:<br>
</div>
<blockquote cite="mid:86AE20A1-2A4D-4C81-B916-BEF664E710E3@cnrs.fr"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
Oops, i forgot to add all people in CC.
<div><br>
</div>
<div>Yes, i guessed.
<div><br>
</div>
<div>With TCP protocol, all my volume seem OK and I dont note,
for the moment, any hang. <br>
</div>
</div>
</blockquote>
<br>
So if I understand correctly , everything is fine with tcp (no hang,
no transport end point disconnected error),and both happens for
rdma. please correct me if not so.<br>
<br>
<br>
<blockquote cite="mid:86AE20A1-2A4D-4C81-B916-BEF664E710E3@cnrs.fr"
type="cite">
<div>
<div><br>
</div>
<div>mount command:</div>
<div><span class="Apple-tab-span" style="white-space: pre;"> </span>-
with RDMA: <span style="background-color: rgb(0, 0, 0); color:
rgb(255, 255, 255); font-family: Menlo; font-size: 11px;">mount
-t glusterfs -o
transport=rdma,direct-io-mode=disable,enable-ino32
ib-storage1:vol_home /mnt</span></div>
<div>
<div apple-content-edited="true"><span class="Apple-tab-span"
style="white-space: pre;"> </span>- with TCP: <span
style="background-color: rgb(0, 0, 0); color: rgb(255,
255, 255); font-family: Menlo; font-size: 11px;">mount -t
glusterfs -o
transport=tcp,direct-io-mode=disable,enable-ino32
ib-storage1:vol_home /mnt</span></div>
<div apple-content-edited="true"><br>
</div>
<div apple-content-edited="true">volume status:</div>
<div apple-content-edited="true">
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;"># gluster volume status all</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Status of volume: vol_home</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Gluster process
TCP Port RDMA Port Online Pid</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage1:/export/brick_home/brick1</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">/data
49159 49165 Y 6547 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage2:/export/brick_home/brick1</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">/data
49161 49173 Y 24348</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage3:/export/brick_home/brick1</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">/data
49152 49156 Y 5616 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage4:/export/brick_home/brick1</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">/data
49152 49162 Y 5424 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage1:/export/brick_home/brick2</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">/data
49160 49166 Y 6548 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage2:/export/brick_home/brick2</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">/data
49162 49174 Y 24355</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage3:/export/brick_home/brick2</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">/data
49153 49157 Y 5635 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage4:/export/brick_home/brick2</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">/data
49153 49163 Y 5443 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Self-heal Daemon on localhost
N/A N/A Y 6534 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Self-heal Daemon on ib-storage3
N/A N/A Y 7656 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Self-heal Daemon on ib-storage2
N/A N/A Y 24519</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Self-heal Daemon on ib-storage4
N/A N/A Y 7288 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0); min-height:
13px;"><span style="font-size: 9px;"> <br
class="webkit-block-placeholder">
</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Task Status of Volume vol_home</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">There are no active volume tasks</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0); min-height:
13px;"><span style="font-size: 9px;"> <br
class="webkit-block-placeholder">
</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Status of volume: vol_shared</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Gluster process
TCP Port RDMA Port Online Pid</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage1:/export/brick_shared/data 49152 49164
Y 6554 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage2:/export/brick_shared/data 49152 49172
Y 24362</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Self-heal Daemon on localhost
N/A N/A Y 6534 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Self-heal Daemon on ib-storage3
N/A N/A Y 7656 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Self-heal Daemon on ib-storage2
N/A N/A Y 24519</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Self-heal Daemon on ib-storage4
N/A N/A Y 7288 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0); min-height:
13px;"><span style="font-size: 9px;"> <br
class="webkit-block-placeholder">
</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Task Status of Volume vol_shared</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">There are no active volume tasks</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0); min-height:
13px;"><span style="font-size: 9px;"> <br
class="webkit-block-placeholder">
</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Status of volume:
vol_workdir_amd</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Gluster process
TCP Port RDMA Port Online Pid</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage1:/export/brick_workdir/bri</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">ck1/data
49191 49192 Y 6555 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage3:/export/brick_workdir/bri</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">ck1/data
49164 49165 Y 6368 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage1:/export/brick_workdir/bri</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">ck2/data
49193 49194 Y 6576 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage3:/export/brick_workdir/bri</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">ck2/data
49166 49167 Y 6387 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0); min-height:
13px;"><span style="font-size: 9px;"> <br
class="webkit-block-placeholder">
</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Task Status of Volume
vol_workdir_amd</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">There are no active volume tasks</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0); min-height:
13px;"><span style="font-size: 9px;"> <br
class="webkit-block-placeholder">
</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Status of volume:
vol_workdir_intel</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Gluster process
TCP Port RDMA Port Online Pid</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage2:/export/brick_workdir/bri</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">ck1/data
49175 49176 Y 24371</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage2:/export/brick_workdir/bri</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">ck2/data
49177 49178 Y 24372</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage4:/export/brick_workdir/bri</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">ck1/data
49164 49165 Y 5571 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Brick
ib-storage4:/export/brick_workdir/bri</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">ck2/data
49166 49167 Y 5590 </span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0); min-height:
13px;"><span style="font-size: 9px;"> <br
class="webkit-block-placeholder">
</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Task Status of Volume
vol_workdir_intel</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">------------------------------------------------------------------------------</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">There are no active volume tasks</span></div>
<div><br>
</div>
</div>
<div apple-content-edited="true">Concerning the brick logs, do
you wanna have all bricks on every servers?</div>
</div>
</div>
</blockquote>
any errors from client log and bricks logs, and logs which has
message id in between 102000 to 104000 from the same .<br>
<br>
Rafi KC<br>
<br>
<blockquote cite="mid:86AE20A1-2A4D-4C81-B916-BEF664E710E3@cnrs.fr"
type="cite">
<div>
<div>
<div apple-content-edited="true"><br>
</div>
<div apple-content-edited="true">Geoffrey</div>
</div>
<div><br>
</div>
<div>
<div apple-content-edited="true">
------------------------------------------------------<br>
Geoffrey Letessier<br>
Responsable informatique & ingénieur système<br>
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique<br>
Institut de Biologie Physico-Chimique<br>
13, rue Pierre et Marie Curie - 75005 Paris<br>
Tel: 01 58 41 50 93 - eMail: <a moz-do-not-send="true"
href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a>
</div>
<br>
<div>
<div>Le 22 juil. 2015 à 10:00, Mohammed Rafi K C <<a
moz-do-not-send="true" href="mailto:rkavunga@redhat.com">rkavunga@redhat.com</a>>
a écrit :</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<div bgcolor="#FFFFFF" text="#000000"> <br>
<br>
<div class="moz-cite-prefix">On 07/22/2015 12:55 PM,
Geoffrey Letessier wrote:<br>
</div>
<blockquote
cite="mid:2AB9D908-4584-4A49-AC01-92EB04FE1CF3@cnrs.fr"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
Concerning the hang, I just saw this only once with
TCP protocol but, actually, RDMA seems to be in cause.</blockquote>
<br>
If you are mounting a tcp,rdma volume using tcp
protocol, all the communication will go through the tcp
connection and rdma won't come in between client and
server.<br>
<br>
<blockquote
cite="mid:2AB9D908-4584-4A49-AC01-92EB04FE1CF3@cnrs.fr"
type="cite">… And, after a moment (a few minutes after
having restarted my back-transfert of around 40TB), my
volume fall down (and all my rsync too):
<div>
<div style="margin: 0px; font-size: 11px;
font-family: Menlo; color: rgb(255, 255, 255);
background-color: rgb(0, 0, 0);">[root@atlas ~]#
df -h /mnt</div>
<div style="margin: 0px; font-size: 11px;
font-family: Menlo; color: rgb(255, 255, 255);
background-color: rgb(0, 0, 0);">df: « /mnt »:
Noeud final de transport n'est pas connecté</div>
<div style="margin: 0px; font-size: 11px;
font-family: Menlo; color: rgb(255, 255, 255);
background-color: rgb(0, 0, 0);">df: aucun système
de fichiers traité</div>
<div>aka "transport endpoint is not connected »</div>
</div>
</blockquote>
<br>
Can you sent me the following details , if possible, ?<br>
1) mount command used, 2) volume status 3) Client, brick
logs <br>
<br>
Regards<br>
Rafi KC<br>
<br>
<blockquote
cite="mid:2AB9D908-4584-4A49-AC01-92EB04FE1CF3@cnrs.fr"
type="cite">
<div>
<div><br>
</div>
<div>Geoffrey</div>
<div><br>
</div>
<div><br>
</div>
<div apple-content-edited="true">
------------------------------------------------------<br>
Geoffrey Letessier<br>
Responsable informatique & ingénieur système<br>
UPR 9080 - CNRS - Laboratoire de Biochimie
Théorique<br>
Institut de Biologie Physico-Chimique<br>
13, rue Pierre et Marie Curie - 75005 Paris<br>
Tel: 01 58 41 50 93 - eMail: <a
moz-do-not-send="true"
href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a>
</div>
<br>
<div>
<div>Le 22 juil. 2015 à 09:17, Geoffrey Letessier
<<a moz-do-not-send="true"
href="mailto:geoffrey.letessier@cnrs.fr">geoffrey.letessier@cnrs.fr</a>>
a écrit :</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<meta http-equiv="Content-Type"
content="text/html; charset=windows-1252">
<div style="word-wrap: break-word;
-webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;">Hi Rafi,
<div><br>
</div>
<div>It’s what I do. But I note particularly
this kind of trouble when I mount my volumes
manually.</div>
<div><br>
</div>
<div>In addition, when I changed my
transport-type from tcp or rdma to tcp,rdma,
I have had to restart my volume in order
they can took effect. </div>
<div><br>
</div>
<div>I wonder if these trouble are not due to
RDMA protocol… because it looks like more
stable with TCP one.</div>
<div><br>
</div>
<div>Another idea?</div>
<div>Thanks for replying and by advance,</div>
<div>Geoffrey</div>
<div apple-content-edited="true">
------------------------------------------------------<br>
Geoffrey Letessier<br>
Responsable informatique &
ingénieur système<br>
UPR 9080 - CNRS - Laboratoire de Biochimie
Théorique<br>
Institut de Biologie Physico-Chimique<br>
13, rue Pierre et Marie Curie - 75005 Paris<br>
Tel: 01 58 41 50 93 - eMail: <a
moz-do-not-send="true"
href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a>
</div>
<br>
<div>
<div>Le 22 juil. 2015 à 07:33, Mohammed Rafi
K C <<a moz-do-not-send="true"
href="mailto:rkavunga@redhat.com">rkavunga@redhat.com</a>>
a écrit :</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<meta content="text/html;
charset=windows-1252"
http-equiv="Content-Type">
<div bgcolor="#FFFFFF" text="#000000"> <br>
<br>
<div class="moz-cite-prefix">On
07/22/2015 04:51 AM, Geoffrey
Letessier wrote:<br>
</div>
<blockquote
cite="mid:AE01B7C4-7319-4F71-8912-AD5C775F956F@cnrs.fr"
type="cite">
<meta http-equiv="Content-Type"
content="text/html;
charset=windows-1252">
Hi Niels,
<div><br>
</div>
<div>Thanks for replying. </div>
<div><br>
</div>
<div>In fact, after having checked the
log, I've discovered GlusterFS tried
to connect a brick with a TCP (or
RDMA) port allocated to another
volume… (bug?)</div>
<div>For example, here is a extract of
my workdir.log file :</div>
<div>
<div style="margin: 0px; font-size:
11px; font-family: Menlo; color:
rgb(255, 255, 255);
background-color: rgb(0, 0, 0);
position: static; z-index: auto;">[2015-07-21
21:34:01.820188] E
[socket.c:2332:socket_connect_finish]
0-vol_workdir_amd-client-0:
connection to 10.0.4.1:49161
failed (Connexion refusée)</div>
<div style="margin: 0px; font-size:
11px; font-family: Menlo; color:
rgb(255, 255, 255);
background-color: rgb(0, 0, 0);
position: static; z-index: auto;">[2015-07-21
21:34:01.822563] E
[socket.c:2332:socket_connect_finish]
0-vol_workdir_amd-client-2:
connection to 10.0.4.1:49162
failed (Connexion refusée)</div>
<div><br>
</div>
<div>But the 2 ports (49161 and
49162) concerned only my vol_home
volume, not the vol_workdir_amd
one.</div>
<div><br>
</div>
<div>Now, after having restart all
glusterd synchronously (pdsh -w
cl-storage[1-4] service glusterd
restart), all seems to be back
into a normal situation (size,
write permission, etc.)</div>
<div><br>
</div>
<div>But, a few minutes later, i
note a strange thing I notice
since i’ve upgraded my cluster
storage from 3.5.3 to 3.7.2-3:
when I try to mount some volume
(particularly my vol_shared volume
(replicated volume)) my system can
hang… And, because I use it in my
bashrc file for my environment
modules, i need to restart my
node. Idem if I try to do a DF on
my mounted volume (if it doesn’t
hang during the mount).</div>
<div><br>
</div>
<div>With TCP transport-type, the
situation seems to be more
stable..</div>
<div><br>
</div>
<div>In addition: If I restart a
storage node, I can’t use Gluster
CLI (it also hang).</div>
<div><br>
</div>
<div>Do you have an idea?</div>
</div>
</blockquote>
<br>
Are you using bash script to start/mount
the volume ? If so, add a sleep after
volume start and mount, to allow all the
process to start properly. Because RDMA
protocol will take some time to init the
resources.<br>
<br>
Regards<br>
Rafi KC<br>
<br>
<br>
<br>
<blockquote
cite="mid:AE01B7C4-7319-4F71-8912-AD5C775F956F@cnrs.fr"
type="cite">
<div>
<div><br>
</div>
<div>One more time, thanks a lot for
your help,</div>
<div>Geoffrey</div>
</div>
<div><br>
</div>
<div apple-content-edited="true">
------------------------------------------------------<br>
Geoffrey Letessier<br>
Responsable informatique &
ingénieur système<br>
UPR 9080 - CNRS - Laboratoire
de Biochimie Théorique<br>
Institut de Biologie
Physico-Chimique<br>
13, rue Pierre et Marie Curie -
75005 Paris<br>
Tel: 01 58 41 50 93 - eMail: <a
moz-do-not-send="true"
href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a>
</div>
<br>
<div style="">
<div>Le 21 juil. 2015 à 23:49, Niels
de Vos <<a
moz-do-not-send="true"
href="mailto:ndevos@redhat.com">ndevos@redhat.com</a>>
a écrit :</div>
<br
class="Apple-interchange-newline">
<blockquote type="cite">On Tue, Jul
21, 2015 at 11:20:20PM +0200,
Geoffrey Letessier wrote:<br>
<blockquote type="cite">Hello
Soumya, Hello everybody,<br>
<br>
network.ping-timeout was set to
42 seconds. I set it to 0 but no<br>
difference. The problem was,
after having re-set le
transport-type to<br>
rdma,tcp some brick down after a
few minutes.. Despite of
restarting<br>
volumes, after a few minutes,
some [other/different] bricks
down<br>
again.<br>
</blockquote>
<br>
I'm not sure how if the
ping-timeout is differently
handled when RDMA is<br>
used. Adding two of the guys that
know RDMA well on CC.<br>
<br>
<blockquote type="cite">Now, after
re-creation of my volume, bricks
keep alive but, oddly, i’m<br>
not able to write on my volume.
In addition, I defined a
distributed<br>
volume with 2 servers, 4 bricks
of 250GB each and my final
volume<br>
seems to be only sized to 500GB…
It’s amazing.. <br>
</blockquote>
<br>
As seen further below, the 500GB
volume is caused by two
unreachable<br>
bricks. When the bricks are not
reachable, the size of the bricks
can<br>
not be detected by the client and
therefore 2x 250 GB is missing.<br>
<br>
It is unclear to me why writing to
a pure distributed volume fails.
When<br>
a brick is not reachable, and the
file should be created there, it<br>
would normally get created on an
other brick. When the brick that
should<br>
have the file gets online, and a
new lookup for the file is done, a
so<br>
called "link file" is created,
which points to the file on the
other<br>
brick. I guess the failure has to
do with the connection issues, and
I<br>
would suggest to get that solved
first.<br>
<br>
HTH,<br>
Niels<br>
<br>
<br>
<blockquote type="cite">Here you
can find some information:<br>
# gluster volume status
vol_workdir_amd<br>
Status of volume:
vol_workdir_amd<br>
Gluster process
TCP
Port RDMA Port Online Pid<br>
------------------------------------------------------------------------------<br>
Brick
ib-storage1:/export/brick_workdir/bri<br>
ck1/data
49185
49186 Y 23098<br>
Brick
ib-storage3:/export/brick_workdir/bri<br>
ck1/data
49158
49159 Y 3886 <br>
Brick
ib-storage1:/export/brick_workdir/bri<br>
ck2/data
49187
49188 Y 23117<br>
Brick
ib-storage3:/export/brick_workdir/bri<br>
ck2/data
49160
49161 Y 3905 <br>
<br>
# gluster volume info
vol_workdir_amd<br>
<br>
Volume Name: vol_workdir_amd<br>
Type: Distribute<br>
Volume ID:
087d26ea-c6df-4cbe-94af-ecd87b59aedb<br>
Status: Started<br>
Number of Bricks: 4<br>
Transport-type: tcp,rdma<br>
Bricks:<br>
Brick1:
ib-storage1:/export/brick_workdir/brick1/data<br>
Brick2:
ib-storage3:/export/brick_workdir/brick1/data<br>
Brick3:
ib-storage1:/export/brick_workdir/brick2/data<br>
Brick4:
ib-storage3:/export/brick_workdir/brick2/data<br>
Options Reconfigured:<br>
performance.readdir-ahead: on<br>
<br>
# pdsh -w storage[1,3] df -h
/export/brick_workdir/brick{1,2}<br>
storage3: Filesystem
Size Used Avail Use%
Mounted on<br>
storage3:
/dev/mapper/st--block1-blk1--workdir<br>
storage3:
250G 34M
250G 1%
/export/brick_workdir/brick1<br>
storage3:
/dev/mapper/st--block2-blk2--workdir<br>
storage3:
250G 34M
250G 1%
/export/brick_workdir/brick2<br>
storage1: Filesystem
Size Used Avail Use%
Mounted on<br>
storage1:
/dev/mapper/st--block1-blk1--workdir<br>
storage1:
250G 33M
250G 1%
/export/brick_workdir/brick1<br>
storage1:
/dev/mapper/st--block2-blk2--workdir<br>
storage1:
250G 33M
250G 1%
/export/brick_workdir/brick2<br>
<br>
# df -h /workdir/<br>
Filesystem Size Used
Avail Use% Mounted on<br>
localhost:vol_workdir_amd.rdma<br>
500G 67M
500G 1% /workdir<br>
<br>
# touch /workdir/test<br>
touch: impossible de faire un
touch « /workdir/test »: Aucun
fichier ou dossier de ce type<br>
<br>
# tail -30l
/var/log/glusterfs/workdir.log <br>
Host Unreachable, Check your
connection with IPoIB<br>
[2015-07-21 21:10:33.927673] W
[rdma.c:1263:gf_rdma_cm_event_handler]
0-vol_workdir_amd-client-2: cma
event RDMA_CM_EVENT_REJECTED,
error 8 (me:10.0.4.1:1020
peer:10.0.4.1:49174)<br>
Host Unreachable, Check your
connection with IPoIB<br>
[2015-07-21 21:10:37.877231] I
[rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-0:
changing port to 49173 (from 0)<br>
[2015-07-21 21:10:37.880556] I
[rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-2:
changing port to 49174 (from 0)<br>
[2015-07-21 21:10:37.914661] W
[rdma.c:1263:gf_rdma_cm_event_handler]
0-vol_workdir_amd-client-0: cma
event RDMA_CM_EVENT_REJECTED,
error 8 (me:10.0.4.1:1021
peer:10.0.4.1:49173)<br>
Host Unreachable, Check your
connection with IPoIB<br>
[2015-07-21 21:10:37.923535] W
[rdma.c:1263:gf_rdma_cm_event_handler]
0-vol_workdir_amd-client-2: cma
event RDMA_CM_EVENT_REJECTED,
error 8 (me:10.0.4.1:1020
peer:10.0.4.1:49174)<br>
Host Unreachable, Check your
connection with IPoIB<br>
[2015-07-21 21:10:41.883925] I
[rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-0:
changing port to 49173 (from 0)<br>
[2015-07-21 21:10:41.887085] I
[rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-2:
changing port to 49174 (from 0)<br>
[2015-07-21 21:10:41.919394] W
[rdma.c:1263:gf_rdma_cm_event_handler]
0-vol_workdir_amd-client-0: cma
event RDMA_CM_EVENT_REJECTED,
error 8 (me:10.0.4.1:1021
peer:10.0.4.1:49173)<br>
Host Unreachable, Check your
connection with IPoIB<br>
[2015-07-21 21:10:41.932622] W
[rdma.c:1263:gf_rdma_cm_event_handler]
0-vol_workdir_amd-client-2: cma
event RDMA_CM_EVENT_REJECTED,
error 8 (me:10.0.4.1:1020
peer:10.0.4.1:49174)<br>
Host Unreachable, Check your
connection with IPoIB<br>
[2015-07-21 21:10:44.682636] W
[dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
subvolume for hash (value) =
1072520554<br>
[2015-07-21 21:10:44.682947] W
[dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
subvolume for hash (value) =
1072520554<br>
[2015-07-21 21:10:44.683240] W
[dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
subvolume for hash (value) =
1072520554<br>
[2015-07-21 21:10:44.683472] W
[dht-diskusage.c:48:dht_du_info_cbk]
0-vol_workdir_amd-dht: failed to
get disk info from
vol_workdir_amd-client-0<br>
[2015-07-21 21:10:44.683506] W
[dht-diskusage.c:48:dht_du_info_cbk]
0-vol_workdir_amd-dht: failed to
get disk info from
vol_workdir_amd-client-2<br>
[2015-07-21 21:10:44.683532] W
[dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
subvolume for hash (value) =
1072520554<br>
[2015-07-21 21:10:44.683551] W
[fuse-bridge.c:1970:fuse_create_cbk]
0-glusterfs-fuse: 18: /test
=> -1 (Aucun fichier ou
dossier de ce type)<br>
[2015-07-21 21:10:44.683619] W
[dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
subvolume for hash (value) =
1072520554<br>
[2015-07-21 21:10:44.683846] W
[dht-layout.c:189:dht_layout_search]
0-vol_workdir_amd-dht: no
subvolume for hash (value) =
1072520554<br>
[2015-07-21 21:10:45.886807] I
[rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-0:
changing port to 49173 (from 0)<br>
[2015-07-21 21:10:45.893059] I
[rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol_workdir_amd-client-2:
changing port to 49174 (from 0)<br>
[2015-07-21 21:10:45.920434] W
[rdma.c:1263:gf_rdma_cm_event_handler]
0-vol_workdir_amd-client-0: cma
event RDMA_CM_EVENT_REJECTED,
error 8 (me:10.0.4.1:1021
peer:10.0.4.1:49173)<br>
Host Unreachable, Check your
connection with IPoIB<br>
[2015-07-21 21:10:45.925292] W
[rdma.c:1263:gf_rdma_cm_event_handler]
0-vol_workdir_amd-client-2: cma
event RDMA_CM_EVENT_REJECTED,
error 8 (me:10.0.4.1:1020
peer:10.0.4.1:49174)<br>
Host Unreachable, Check your
connection with IPoIB<br>
<br>
I use GlusterFS in production
since around 3 years without any
block<br>
problem but now the situation is
awesome since more than 3 weeks…<br>
Indeed, our production are down
since roughly 3.5 weeks (with a
lot<br>
and different problems with
GlusterFS v3.5.3 and now with
3.7.2-3) and<br>
i need to restart it… <br>
<br>
Thanks in advance,<br>
Geoffrey<br>
------------------------------------------------------<br>
Geoffrey Letessier<br>
Responsable informatique &
ingénieur système<br>
UPR 9080 - CNRS - Laboratoire de
Biochimie Théorique<br>
Institut de Biologie
Physico-Chimique<br>
13, rue Pierre et Marie Curie -
75005 Paris<br>
Tel: 01 58 41 50 93 - eMail: <a
moz-do-not-send="true"
href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a><br>
<br>
Le 21 juil. 2015 à 19:36, Soumya
Koduri <<a
moz-do-not-send="true"
href="mailto:skoduri@redhat.com">skoduri@redhat.com</a>>
a écrit :<br>
<br>
<blockquote type="cite">From the
following errors,<br>
<br>
[2015-07-21 14:36:30.495321] I
[MSGID: 114020]
[client.c:2118:notify]
0-vol_shared-client-0: parent
translators are ready,
attempting connect on
transport<br>
[2015-07-21 14:36:30.498989] W
[socket.c:923:__socket_keepalive]
0-socket: failed to set
TCP_USER_TIMEOUT 0 on socket
12, Protocole non disponible<br>
[2015-07-21 14:36:30.499004] E
[socket.c:3015:socket_connect]
0-vol_shared-client-0: Failed
to set keep-alive: Protocole
non disponible<br>
<br>
looks like setting
TCP_USER_TIMEOUT value to 0 on
the socket failed with error
(IIUC) "Protocol not
available".<br>
Could you check if
'network.ping-timeout' is set
to zero for that volume using
'gluster volume info'? Anyways
from the code looks like
'TCP_USER_TIMEOUT' can take
value zero. Not sure why it
has failed.<br>
<br>
Niels, any thoughts?<br>
<br>
Thanks,<br>
Soumya<br>
<br>
On 07/21/2015 08:15 PM,
Geoffrey Letessier wrote:<br>
<blockquote type="cite">[2015-07-21
14:36:30.495321] I [MSGID:
114020]
[client.c:2118:notify]<br>
0-vol_shared-client-0:
parent translators are
ready, attempting connect<br>
on transport<br>
[2015-07-21 14:36:30.498989]
W
[socket.c:923:__socket_keepalive]<br>
0-socket: failed to set
TCP_USER_TIMEOUT 0 on socket
12, Protocole non<br>
disponible<br>
[2015-07-21 14:36:30.499004]
E
[socket.c:3015:socket_connect]<br>
0-vol_shared-client-0:
Failed to set keep-alive:
Protocole non disponible<br>
</blockquote>
</blockquote>
<br>
</blockquote>
</blockquote>
</div>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>