<div dir="ltr">The versions were:<div>gluster client: 3.6.2</div><div>gluster server: 3.6.0</div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-03-08 18:17 GMT+01:00 Vijay Bellur <span dir="ltr"><<a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 03/08/2015 09:36 AM, Przemysław Mroczek wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I don't have volfiles, they are not on our machines as I said previously<br>
we don't have impact on gluster servers.<br>
<br>
I saw some graph that looks similiar to volume file on logs. I will<br>
paste it here but we don't really have any impact on that. We are just<br>
using client to connect to gluster servers, we are not in control of.<br>
<br>
</blockquote>
<br></span>
I would recommend to not alter the default for frame timeout.<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Btw, do you think that different versions of gluster client and gluster<br>
server could be an issue here?<br>
<br>
</blockquote>
<br></span>
It can potentially be. What versions are you using on the servers and the client?<br>
<br>
-Vijay<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
2015-03-08 1:29 GMT+01:00 Vijay Bellur <<a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a><br></span>
<mailto:<a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>>>:<div><div class="h5"><br>
<br>
On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:<br>
<br>
Hi guys,<br>
<br>
We have rails app, which is using gluster for our distributed file<br>
system. The glusters servers are hosted independently as part of<br>
deal<br>
with other, we don't have any impact on them, we are connected o<br>
them by<br>
using gluster native client.<br>
<br>
We tried to resolve this issue using help from the admins of the<br>
company<br>
that is hosting our gluster servers, but they say that's the client<br>
issue and we ran out of ideas how that's possible if we are not<br>
doing<br>
anything special here.<br>
<br>
Information about independent gluster servers:<br>
-version: 3.6.0.42.1<br>
- They are using red hat<br>
-They are enterprise so the are always using older versions<br>
<br>
Our servers:<br>
System version: Ubuntu 14.04<br>
Our gluster client version: 3.6.2<br>
<br>
The exact problem is that it often happens(couple times a week) that<br>
errors in gluster causes proceses to become zombies. It happens<br>
with our<br>
application server(unicorn), nginx and our crawling script that<br>
is run<br>
as daemon.<br>
<br>
Our fstab file:<br>
<br>
10.10.11.17:/drslk-prod /mnt/storage glusterfs<br></div></div>
defaults,_netdev,nobootwait,__<u></u>fetch-attempts=10 0 0<br>
10.10.11.17:/drslk-backup /mnt/backup glusterfs<br>
defaults,_netdev,nobootwait,__<u></u>fetch-attempts=10 0 0<span class=""><br>
<br>
Logs from gluster:<br>
<br>
2015-02-18 12:36:12.375695] E<br></span>
[rpc-clnt.c:362:saved_frames__<u></u>_unwind] (--><br>
/usr/lib/x86_64-linux-gnu/__<u></u>libglusterfs.so.0(_gf_log___<u></u>callingfn+0x186)[__<u></u>0x7fb41ddeada6]<br>
(--><br>
/usr/lib/x86_64-linux-gnu/__<u></u>libgfrpc.so.0(saved_frames___<u></u>unwind+0x1de)[0x7fb41d<br>
bc1c7e] (--><br>
/usr/lib/x86_64-linux-gnu/__<u></u>libgfrpc.so.0(saved_frames___<u></u>destroy+0xe)[0x7fb41dbc1d8e]<br>
(--><br>
/usr/lib/x86_64-linux-gnu/__<u></u>libgfrpc.so.0(rpc_clnt___<u></u>connection_cleanup+0x82)[__<u></u>0x7fb41dbc3602]<br>
(--> /usr/lib/x86_64-linux-gnu/__<u></u>libgfrpc.so.0(rpc<br>
_clnt_notify+0x48)[__<u></u>0x7fb41dbc3d98] )))))<span class=""><br>
0-drslk-prod-client-10: forced<br>
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at<br>
2015-02-18<br>
12:36:12.361489 (xid=0x5d475da)<br>
[2015-02-18 12:36:12.375765] W<br></span>
[client-rpc-fops.c:2766:__<u></u>client3_3_lookup_cbk]<span class=""><br>
0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br></span>
/system/posts/00/00/71/77/59._<u></u>_jpg<br>
(2ad81c2b-a141-478d-9dd4-__<u></u>253345edbce<span class=""><br>
b)<br>
[2015-02-18 12:36:12.376288] E<br></span>
[rpc-clnt.c:362:saved_frames__<u></u>_unwind] (--><br>
/usr/lib/x86_64-linux-gnu/__<u></u>libglusterfs.so.0(_gf_log___<u></u>callingfn+0x186)[__<u></u>0x7fb41ddeada6]<br>
(--><br>
/usr/lib/x86_64-linux-gnu/__<u></u>libgfrpc.so.0(saved_frames___<u></u>unwind+0x1de)[0x7fb41d<br>
bc1c7e] (--><br>
/usr/lib/x86_64-linux-gnu/__<u></u>libgfrpc.so.0(saved_frames___<u></u>destroy+0xe)[0x7fb41dbc1d8e]<br>
(--><br>
/usr/lib/x86_64-linux-gnu/__<u></u>libgfrpc.so.0(rpc_clnt___<u></u>connection_cleanup+0x82)[__<u></u>0x7fb41dbc3602]<br>
(--> /usr/lib/x86_64-linux-gnu/__<u></u>libgfrpc.so.0(rpc<br>
_clnt_notify+0x48)[__<u></u>0x7fb41dbc3d98] )))))<span class=""><br>
0-drslk-prod-client-10: forced<br>
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at<br>
2015-02-18<br>
12:36:12.361858 (xid=0x5d475db)<br>
[2015-02-18 12:36:12.376355] W<br></span>
[client-rpc-fops.c:2766:__<u></u>client3_3_lookup_cbk]<span class=""><br>
0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br></span>
/system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-__<u></u>33b893af103d)<span class=""><br>
[2015-02-18 12:36:12.376711] I<br></span>
[socket.c:3292:socket_submit__<u></u>_request]<span class=""><br>
0-drslk-prod-client-10: not connected (priv->connected = 0)<br></span>
[2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt___<u></u>submit]<span class=""><br>
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc<br>
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport<br>
(drslk-prod-client-10)<br>
[2015-02-18 12:36:12.376814] W<br></span>
[client-rpc-fops.c:2766:__<u></u>client3_3_lookup_cbk]<span class=""><br>
0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br></span>
(null) (00000000-0000-0000-0000-__<u></u>000000000000)<br>
[2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc___<u></u>notify]<span class=""><br>
0-drslk-prod-client-10: disconnected from drslk-prod-client-10.<br>
Client<br>
process will keep trying to connect to glusterd until brick's<br>
port is<br>
available<br></span>
[2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt___<u></u>submit]<span class=""><br>
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd<br>
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport<br>
(drslk-prod-client-10)<br>
[2015-02-18 12:36:12.376906] W<br></span>
[client-rpc-fops.c:2766:__<u></u>client3_3_lookup_cbk]<span class=""><br>
0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br></span>
(null) (00000000-0000-0000-0000-__<u></u>000000000000)<span class=""><br>
[2015-02-18 12:36:12.376931] E<br></span>
[socket.c:2267:socket_connect_<u></u>__finish]<span class=""><br>
0-drslk-prod-client-10: connection to <a href="http://10.10.11.23:24007" target="_blank">10.10.11.23:24007</a><br></span>
<<a href="http://10.10.11.23:24007" target="_blank">http://10.10.11.23:24007</a>><span class=""><br>
<<a href="http://10.10.11.23:24007/" target="_blank">http://10.10.11.23:24007/</a>> failed (Connection refused)<br>
<br>
[2015-02-18 12:36:12.379296] W<br></span>
[client-rpc-fops.c:2766:__<u></u>client3_3_lookup_cbk]<span class=""><br>
0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br></span>
(null) (00000000-0000-0000-0000-__<u></u>000000000000)<span class=""><br>
[2015-02-18 12:36:12.379700] W<br></span>
[client-rpc-fops.c:2766:__<u></u>client3_3_lookup_cbk]<span class=""><br>
0-drslk-prod-client-10:<br>
remote operation failed: Transport endpoint is not connected. Path:<br></span>
(null) (00000000-0000-0000-0000-__<u></u>000000000000)<span class=""><br>
[2015-02-18 13:10:52.759736] E<br></span>
[client-handshake.c:1496:__<u></u>client_query_portmap_cbk]<span class=""><br>
0-drslk-prod-client-10: failed to get the port number for remote<br>
subvolume. Please run 'gluster volume status' on server to see<br>
if brick<br>
process is running.<br></span>
[2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc___<u></u>notify]<span class=""><br>
0-drslk-prod-client-10: disconnected from drslk-prod-client-10.<br>
Client<br>
process will keep trying to connect to glusterd until brick's<br>
port is<br>
available<br></span>
[2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt___<u></u>reconfig]<span class=""><br>
0-drslk-prod-client-10: changing port to 49349 (from 0)<br>
[2015-02-18 13:11:02.898097] I<br></span>
[client-handshake.c:1413:__<u></u>select_server_supported___<u></u>programs]<span class=""><br>
0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437),<br>
Version (330)<br>
[2015-02-18 13:11:02.898446] I<br></span>
[client-handshake.c:1200:__<u></u>client_setvolume_cbk]<span class=""><br>
0-drslk-prod-client-10:<br>
Connected to drslk-prod-client-10, attached to remote volume<br>
'/GLUSTERFS/drslk-prod'.<br>
[2015-02-18 13:11:02.898460] I<br></span>
[client-handshake.c:1210:__<u></u>client_setvolume_cbk]<span class=""><br>
0-drslk-prod-client-10:<br>
Server and Client lk-version numbers are not same, reopening the fds<br>
<br>
<br>
Can you provide the gluster volume configuration details?<br>
<br>
It does look like frame-timeout for the volume has been set to 60.<br>
Is there any specific reason? Normally altering the frame-timeout is<br>
not recommended.<br>
<br>
-Vijay<br>
<br>
<br>
</span></blockquote>
<br>
</blockquote></div><br></div>