<div dir="ltr"><div>Hi,</div><div><br></div><div>I have experienced what looks like a very similar crash. Gluster 3.7.6 on CentOS 7. No errors on the bricks or on other at the time mounted clients. Relatively high load at the time.</div><div><br></div><div>Remounting the filesystem brought it back online.</div><div>


<p class=""><span class="">pending frames:<br></span>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(STAT)<br>frame : type(1) op(STAT)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(1) op(READ)<br>frame : type(0) op(0)<br>patchset: git://<a href="http://git.gluster.com/glusterfs.git">git.gluster.com/glusterfs.git</a><br>signal received: 6<br>time of crash: <br>2016-02-22 10:28:45<br>configuration details:<br>argp 1<br>backtrace 1<br>dlfcn 1<br>libpthread 1<br>llistxattr 1<br>setfsid 1<br>spinlock 1<br>epoll.h 1<br>xattr.h 1<br>st_atim.tv_nsec 1<br>package-string: glusterfs 3.7.6<br>/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f83387f7012]<br>/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f83388134dd]<br>/lib64/libc.so.6(+0x35670)[0x7f8336ee5670]<br>/lib64/libc.so.6(gsignal+0x37)[0x7f8336ee55f7]<br>/lib64/libc.so.6(abort+0x148)[0x7f8336ee6ce8]<br>/lib64/libc.so.6(+0x75317)[0x7f8336f25317]<br>/lib64/libc.so.6(+0x7cfe1)[0x7f8336f2cfe1]<br>/lib64/libglusterfs.so.0(loc_wipe+0x27)[0x7f83387f4d47]<br>/usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_local_wipe+0x11)[0x7f8329c8e5f1]<br>/usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_stat_cbk+0x10c)[0x7f8329c8f4fc]<br>/lib64/libglusterfs.so.0(default_stat_cbk+0xac)[0x7f83387fcc5c]<br>/usr/lib64/glusterfs/3.7.6/xlator/cluster/distribute.so(dht_file_attr_cbk+0x149)[0x7f832ab2a409]<br>/usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client3_3_stat_cbk+0x3c6)[0x7f832ad6d266]<br>/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f83385c5b80]<br>/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f83385c5e3f]<br>/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f83385c1983]<br>/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f832d261506]<br>/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f832d2643f4]<br>/lib64/libglusterfs.so.0(+0x878ea)[0x7f83388588ea]<br>/lib64/libpthread.so.0(+0x7dc5)[0x7f833765fdc5]<br>/lib64/libc.so.6(clone+0x6d)[0x7f8336fa621d]<br><br></p><p class="">Kind regards,<br>Fredrik Widlund</p></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 23, 2016 at 1:00 PM,  <span dir="ltr">&lt;<a href="mailto:gluster-users-request@gluster.org" target="_blank">gluster-users-request@gluster.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Date: Mon, 22 Feb 2016 15:08:47 -0500<br>

From: Dj Merrill &lt;<a href="mailto:gluster@deej.net">gluster@deej.net</a>&gt;<br>

To: Gaurav Garg &lt;<a href="mailto:ggarg@redhat.com">ggarg@redhat.com</a>&gt;<br>

Cc: <a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a><br>

Subject: Re: [Gluster-users] glusterfs client crashes<br>

Message-ID: &lt;<a href="mailto:56CB6ACF.5080408@deej.net">56CB6ACF.5080408@deej.net</a>&gt;<br>

Content-Type: text/plain; charset=utf-8; format=flowed<br>

<br>

On 2/21/2016 2:23 PM, Dj Merrill wrote:<br>

 &gt; Very interesting.  They were reporting both bricks offline, but the<br>

 &gt; processes on both servers were still running.  Restarting glusterfsd on<br>

 &gt; one of the servers brought them both back online.<br>

<br>

I realize I wasn&#39;t clear in my comments yesterday and would like to<br>

elaborate on this a bit further. The &quot;very interesting&quot; comment was<br>

sparked because when we were running 3.7.6, the bricks were not<br>

reporting as offline when a client was having an issue, so this is new<br>

behaviour now that we are running 3.7.8 (or a different issue entirely).<br>

<br>

The other point that I was not clear on is that we may have one client<br>

reporting the &quot;Transport endpoint is not connected&quot; error, but the other<br>

40+ clients all continue to work properly. This is the case with both<br>

3.7.6 and 3.7.8.<br>

<br>

Curious, how can the other clients continue to work fine if both Gluster<br>

3.7.8 servers are reporting the bricks as offline?<br>

<br>

What does &quot;offline&quot; mean in this context?<br>

<br>

<br>

Re: the server logs, here is what I&#39;ve found so far listed on both<br>

gluster servers (glusterfs1 and glusterfs2):<br>

<br>

[2016-02-21 08:06:02.785788] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]<br>

0-glusterfs: No change in volfile, continuing<br>

[2016-02-21 18:48:20.677010] W [socket.c:588:__socket_rwv]<br>

0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No<br>

data available)<br>

[2016-02-21 18:48:20.677096] I [MSGID: 114018]<br>

[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from<br>

gv0-client-1. Client process will keep trying to connect to glusterd<br>

until brick&#39;s port is available<br>

[2016-02-21 18:48:31.148564] E [MSGID: 114058]<br>

[client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:<br>

failed to get the port number for remote subvolume. Please run &#39;gluster<br>

volume status&#39; on server to see if brick process is running.<br>

[2016-02-21 18:48:40.941715] W [socket.c:588:__socket_rwv] 0-glusterfs:<br>

readv on (sanitized IP of glusterfs2):24007 failed (No data available)<br>

[2016-02-21 18:48:51.184424] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]<br>

0-glusterfs: No change in volfile, continuing<br>

[2016-02-21 18:48:51.972068] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]<br>

0-mgmt: Volume file changed<br>

[2016-02-21 18:48:51.980210] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]<br>

0-mgmt: Volume file changed<br>

[2016-02-21 18:48:51.985211] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]<br>

0-mgmt: Volume file changed<br>

[2016-02-21 18:48:51.995002] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]<br>

0-mgmt: Volume file changed<br>

[2016-02-21 18:48:53.006079] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]<br>

0-glusterfs: No change in volfile, continuing<br>

[2016-02-21 18:48:53.018104] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]<br>

0-glusterfs: No change in volfile, continuing<br>

[2016-02-21 18:48:53.024060] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]<br>

0-glusterfs: No change in volfile, continuing<br>

[2016-02-21 18:48:53.035170] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]<br>

0-glusterfs: No change in volfile, continuing<br>

[2016-02-21 18:48:53.045637] I [rpc-clnt.c:1847:rpc_clnt_reconfig]<br>

0-gv0-client-1: changing port to 49152 (from 0)<br>

[2016-02-21 18:48:53.051991] I [MSGID: 114057]<br>

[client-handshake.c:1437:select_server_supported_programs]<br>

0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>

[2016-02-21 18:48:53.052439] I [MSGID: 114046]<br>

[client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1: Connected<br>

to gv0-client-1, attached to remote volume &#39;/export/brick1/sdb1&#39;.<br>

[2016-02-21 18:48:53.052486] I [MSGID: 114047]<br>

[client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1: Server<br>

and Client lk-version numbers are not same, reopening the fds<br>

[2016-02-21 18:48:53.052668] I [MSGID: 114035]<br>

[client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-1:<br>

Server lk version = 1<br>

[2016-02-21 18:48:31.148706] I [MSGID: 114018]<br>

[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from<br>

gv0-client-1. Client process will keep trying to connect to glusterd<br>

until brick&#39;s port is available<br>

[2016-02-21 18:49:12.271865] W [socket.c:588:__socket_rwv] 0-glusterfs:<br>

readv on (sanitized IP of glusterfs2):24007 failed (No data available)<br>

[2016-02-21 18:49:15.637745] W [socket.c:588:__socket_rwv]<br>

0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No<br>

data available)<br>

[2016-02-21 18:49:15.637824] I [MSGID: 114018]<br>

[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from<br>

gv0-client-1. Client process will keep trying to connect to glusterd<br>

until brick&#39;s port is available<br>

[2016-02-21 18:49:24.198431] E [socket.c:2278:socket_connect_finish]<br>

0-glusterfs: connection to (sanitized IP of glusterfs2):24007 failed<br>

(Connection refused)<br>

[2016-02-21 18:49:26.204811] E [socket.c:2278:socket_connect_finish]<br>

0-gv0-client-1: connection to (sanitized IP of glusterfs2):24007 failed<br>

(Connection refused)<br>

[2016-02-21 18:49:38.366559] I [MSGID: 108031]<br>

[afr-common.c:1883:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting<br>

local read_child gv0-client-0<br>

[2016-02-21 18:50:54.605535] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]<br>

0-glusterfs: No change in volfile, continuing<br>

[2016-02-21 18:50:54.605639] E [MSGID: 114058]<br>

[client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:<br>

failed to get the port number for remote subvolume. Please run &#39;gluster<br>

volume status&#39; on server to see if brick process is running.<br>

</blockquote></div><br></div></div>