<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Le 01/12/2016 à 13:12, Yannick Perret a
écrit :<br>
</div>
<blockquote
cite="mid:2a937324-816f-9ccc-e708-ec9d368b9189@liris.cnrs.fr"
type="cite">Hello,
<br>
I have a client machine that mounts as NFS a replicate x2 volume.
Practicaly this is configured with automount such as:
<br>
DIR-NAME -rw,soft,intr server1,server2:/VOLUME
<br>
<br>
Gluster servers are using 3.6.7.
<br>
Sometimes the NFS blocks on client with
<br>
server server2 not responding, timed out (here it was connected
on server2)
<br>
but network communication is fine beetween the two machines (they
are connected to the same switch, I can ssh on each, they ping
each other…).
<br>
<br>
I can also see few "xs_tcp_setup_socket: connect returned
unhandled error -107" on the client.
<br>
On 'server2' side I can see in the gluster nfs logs:
<br>
<br>
[2016-12-01 10:50:15.887927] W [rpcsvc.c:261:rpcsvc_program_actor]
0-rpc-service: RPC program version not available (req 100003 2)
<br>
[2016-12-01 10:50:15.887965] E
[rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor
failed to complete successfully
<br>
[2016-12-01 10:50:15.901880] W [rpcsvc.c:261:rpcsvc_program_actor]
0-rpc-service: RPC program version not available (req 100003 4)
<br>
[2016-12-01 10:50:15.901900] E
[rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor
failed to complete successfully
<br>
[2016-12-01 10:51:03.777145] W [rpcsvc.c:261:rpcsvc_program_actor]
0-rpc-service: RPC program version not available (req 100003 2)
<br>
[2016-12-01 10:51:03.777191] E
[rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor
failed to complete successfully
<br>
[2016-12-01 10:51:03.790561] W [rpcsvc.c:261:rpcsvc_program_actor]
0-rpc-service: RPC program version not available (req 100003 4)
<br>
[2016-12-01 10:51:03.790580] E
[rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor
failed to complete successfully
<br>
<br>
</blockquote>
It looks like these correspond to the NFS re-connection (client
trying NFSv2 and NFSv4 I think).<br>
<br>
Just before that here are the logs:<br>
l_layout_new_directory] 0-HOME-LIRIS-dht: assigning range size
0xffe76e40 to HOME-LIRIS-replicate-0<br>
[2016-12-01 10:48:36.990028] W
[client-rpc-fops.c:2145:client3_3_setattr_cbk]
0-HOME-LIRIS-client-1: remote operation failed: Opération non
permise<br>
[2016-12-01 10:48:36.990303] W
[client-rpc-fops.c:2145:client3_3_setattr_cbk]
0-HOME-LIRIS-client-0: remote operation failed: Opération non
permise<br>
The message "I [MSGID: 109036]
[dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
0-HOME-LIRIS-dht: Setting layout of
<gfid:6f8bb427-eea5-4dd5-b004-9db8582bdda2>/_indexer.lock with
[Subvol_name: HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop:
4294967295 ], " repeated 2 times between [2016-12-01
10:48:36.404738] and [2016-12-01 10:48:36.949907]<br>
[2016-12-01 10:48:36.990728] I [MSGID: 109036]
[dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
0-HOME-LIRIS-dht: Setting layout of
<gfid:6f8bb427-eea5-4dd5-b004-9db8582bdda2>/39132555496bb098708af2d5e7b56d67
with [Subvol_name: HOME-LIRIS-replicate-0, Err: -1 , Start: 0 ,
Stop: 4294967295 ], <br>
[2016-12-01 10:50:10.360020] I [dht-rename.c:1344:dht_rename]
0-HOME-LIRIS-dht: renaming
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/tmp_km1NUe
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0) =>
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/general.php
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0)<br>
[2016-12-01 10:50:10.423561] I [dht-rename.c:1344:dht_rename]
0-HOME-LIRIS-dht: renaming
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/tmp_2pOZ5T
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0) =>
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/1.php
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0)<br>
[2016-12-01 10:50:10.485882] I [dht-rename.c:1344:dht_rename]
0-HOME-LIRIS-dht: renaming
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/tmp_86Lmpz
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0) =>
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/general.php
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0)<br>
<br>
<br>
I also tried to set "nfs.mount-rmtab /dev/shm/glusterfs.rmtab" as I
read on an old thread. Will check if it change something.<br>
<br>
Regards,<br>
--<br>
Y.<br>
<br>
<blockquote
cite="mid:2a937324-816f-9ccc-e708-ec9d368b9189@liris.cnrs.fr"
type="cite">at a time that correspond to the NFS timeouts.
<br>
<br>
This problem occurs "often" (at least each day or each 2 days),
and neither client nor servers are on heavy load (memory and CPU
far to be full).
<br>
<br>
Any idea about what can be the reason and how to prevent it to
occur?
<br>
I reduced the autofs timeout in order to reduce impact but it is
not a very nice solution… Note: I can't use the glusterfs client
instead of NFS because of the memory leaks that still exist in it.
<br>
<br>
Thanks.
<br>
<br>
Regards,
<br>
--
<br>
Y.
<br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<p><br>
</p>
</body>
</html>