<div dir="ltr">Hi Atin, Thanks for the reply. Im not sure which logs are relevant so ill just attach them all in a gz file. <div><br></div><div>I ran a sudo gluster volume start gfsvolume force at  2015-03-19 05:49  </div><div>i hope this helps. </div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr"><div>Thank You Kindly,</div><div>Kaamesh</div><div><br></div></div></div></div><div class="gmail_quote">On Sun, Mar 15, 2015 at 11:41 PM, Atin Mukherjee <span dir="ltr">&lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Could you attach the logs for the analysis?<br>
<br>
~Atin<br>
<div><div class="h5"><br>
On 03/13/2015 03:29 PM, Kaamesh Kamalaaharan wrote:<br>
&gt; Hi guys. Ive been using gluster for a while now and despite a few hiccups,<br>
&gt; i find its a great system to use. One of my more persistent hiccups is an<br>
&gt; issue with one brick going offline.<br>
&gt;<br>
&gt; My setup is a 2 brick 2 node setup. my main brick is gfs1 which has not<br>
&gt; given me any problem. gfs2 however keeps going offline. Following<br>
&gt; <a href="http://www.gluster.org/pipermail/gluster-users/2014-June/017583.html" target="_blank">http://www.gluster.org/pipermail/gluster-users/2014-June/017583.html</a><br>
&gt; temporarily fixed the error but  the brick goes offline within the hour.<br>
&gt;<br>
&gt; This is what i get from my volume status command :<br>
&gt;<br>
&gt; sudo gluster volume status<br>
&gt;&gt;<br>
&gt;&gt; Status of volume: gfsvolume<br>
&gt;&gt; Gluster process Port Online Pid<br>
&gt;&gt;<br>
&gt;&gt; ------------------------------------------------------------------------------<br>
&gt;&gt; Brick gfs1:/export/sda/brick 49153 Y 9760<br>
&gt;&gt; Brick gfs2:/export/sda/brick N/A N 13461<br>
&gt;&gt; NFS Server on localhost 2049 Y 13473<br>
&gt;&gt; Self-heal Daemon on localhost N/A Y 13480<br>
&gt;&gt; NFS Server on gfs1 2049 Y 16166<br>
&gt;&gt; Self-heal Daemon on gfs1 N/A Y 16173<br>
&gt;&gt;<br>
&gt;&gt; Task Status of Volume gfsvolume<br>
&gt;&gt;<br>
&gt;&gt; ------------------------------------------------------------------------------<br>
&gt;&gt; There are no active volume tasks<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt; doing sudo gluster volume start gfsvolume force gives me this:<br>
&gt;<br>
&gt; sudo gluster volume status<br>
&gt;&gt;<br>
&gt;&gt; Status of volume: gfsvolume<br>
&gt;&gt; Gluster process Port Online Pid<br>
&gt;&gt;<br>
&gt;&gt; ------------------------------------------------------------------------------<br>
&gt;&gt; Brick gfs1:/export/sda/brick 49153 Y 9760<br>
&gt;&gt; Brick gfs2:/export/sda/brick 49153 Y 13461<br>
&gt;&gt; NFS Server on localhost 2049 Y 13473<br>
&gt;&gt; Self-heal Daemon on localhost N/A Y 13480<br>
&gt;&gt; NFS Server on gfs1 2049 Y 16166<br>
&gt;&gt; Self-heal Daemon on gfs1 N/A Y 16173<br>
&gt;&gt;<br>
&gt;&gt; Task Status of Volume gfsvolume<br>
&gt;&gt;<br>
&gt;&gt; ------------------------------------------------------------------------------<br>
&gt;&gt; There are no active volume tasks<br>
&gt;&gt;<br>
&gt;&gt; half an hour later and my brick goes down again.<br>
&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; This is my glustershd.log. I snipped it because the rest of the log is a<br>
&gt; repeat of the same error<br>
&gt;<br>
&gt;<br>
&gt;&gt;<br>
&gt;&gt; [2015-03-13 02:09:41.951556] I [glusterfsd.c:1959:main]<br>
&gt;&gt; 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.0<br>
&gt;&gt; (/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p<br>
&gt;&gt; /var/lib/glus<br>
&gt;&gt; terd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S<br>
&gt;&gt; /var/run/deac2f873d0ac5b6c3e84b23c4790172.socket --xlator-option<br>
&gt;&gt; *replicate*.node-uuid=adbb7505-3342-4c6d-be3d-75938633612c)<br>
&gt;&gt; [2015-03-13 02:09:41.954173] I [socket.c:3561:socket_init]<br>
&gt;&gt; 0-socket.glusterfsd: SSL support is NOT enabled<br>
&gt;&gt; [2015-03-13 02:09:41.954236] I [socket.c:3576:socket_init]<br>
&gt;&gt; 0-socket.glusterfsd: using system polling thread<br>
&gt;&gt; [2015-03-13 02:09:41.954421] I [socket.c:3561:socket_init] 0-glusterfs:<br>
&gt;&gt; SSL support is NOT enabled<br>
&gt;&gt; [2015-03-13 02:09:41.954443] I [socket.c:3576:socket_init] 0-glusterfs:<br>
&gt;&gt; using system polling thread<br>
&gt;&gt; [2015-03-13 02:09:41.956731] I [graph.c:254:gf_add_cmdline_options]<br>
&gt;&gt; 0-gfsvolume-replicate-0: adding option &#39;node-uuid&#39; for volume<br>
&gt;&gt; &#39;gfsvolume-replicate-0&#39; with value &#39;adbb7505-3342-4c6d-be3d-75938633612c&#39;<br>
&gt;&gt; [2015-03-13 02:09:41.960210] I [rpc-clnt.c:972:rpc_clnt_connection_init]<br>
&gt;&gt; 0-gfsvolume-client-1: setting frame-timeout to 90<br>
&gt;&gt; [2015-03-13 02:09:41.960288] I [socket.c:3561:socket_init]<br>
&gt;&gt; 0-gfsvolume-client-1: SSL support is NOT enabled<br>
&gt;&gt; [2015-03-13 02:09:41.960301] I [socket.c:3576:socket_init]<br>
&gt;&gt; 0-gfsvolume-client-1: using system polling thread<br>
&gt;&gt; [2015-03-13 02:09:41.961095] I [rpc-clnt.c:972:rpc_clnt_connection_init]<br>
&gt;&gt; 0-gfsvolume-client-0: setting frame-timeout to 90<br>
&gt;&gt; [2015-03-13 02:09:41.961134] I [socket.c:3561:socket_init]<br>
&gt;&gt; 0-gfsvolume-client-0: SSL support is NOT enabled<br>
&gt;&gt; [2015-03-13 02:09:41.961145] I [socket.c:3576:socket_init]<br>
&gt;&gt; 0-gfsvolume-client-0: using system polling thread<br>
&gt;&gt; [2015-03-13 02:09:41.961173] I [client.c:2273:notify]<br>
&gt;&gt; 0-gfsvolume-client-0: parent translators are ready, attempting connect on<br>
&gt;&gt; transport<br>
&gt;&gt; [2015-03-13 02:09:41.961412] I [client.c:2273:notify]<br>
&gt;&gt; 0-gfsvolume-client-1: parent translators are ready, attempting connect on<br>
&gt;&gt; transport<br>
&gt;&gt; Final graph:<br>
&gt;&gt;<br>
&gt;&gt; +------------------------------------------------------------------------------+<br>
&gt;&gt;   1: volume gfsvolume-client-0<br>
&gt;&gt;   2:     type protocol/client<br>
&gt;&gt;   3:     option remote-host gfs1<br>
&gt;&gt;   4:     option remote-subvolume /export/sda/brick<br>
&gt;&gt;   5:     option transport-type socket<br>
&gt;&gt;   6:     option frame-timeout 90<br>
&gt;&gt;   7:     option ping-timeout 30<br>
&gt;&gt;   8: end-volume<br>
&gt;&gt;   9:<br>
&gt;&gt;  10: volume gfsvolume-client-1<br>
&gt;&gt;  11:     type protocol/client<br>
&gt;&gt;  12:     option remote-host gfs2<br>
&gt;&gt;  13:     option remote-subvolume /export/sda/brick<br>
&gt;&gt;  14:     option transport-type socket<br>
&gt;&gt;  15:     option frame-timeout 90<br>
&gt;&gt;  16:     option ping-timeout 30<br>
&gt;&gt;  17: end-volume<br>
&gt;&gt;  18:<br>
&gt;&gt;  19: volume gfsvolume-replicate-0<br>
&gt;&gt;  20:     type cluster/replicate<br>
&gt;&gt;  21:     option node-uuid adbb7505-3342-4c6d-be3d-75938633612c<br>
&gt;&gt;  22:     option background-self-heal-count 0<br>
&gt;&gt;  23:     option metadata-self-heal on<br>
&gt;&gt;  24:     option data-self-heal on<br>
&gt;&gt;  25:     option entry-self-heal on<br>
&gt;&gt;  26:     option self-heal-daemon on<br>
&gt;&gt;  27:     option data-self-heal-algorithm diff<br>
&gt;&gt;  28:     option quorum-type fixed<br>
&gt;&gt;  29:     option quorum-count 1<br>
&gt;&gt;  30:     option iam-self-heal-daemon yes<br>
&gt;&gt;  31:     subvolumes gfsvolume-client-0 gfsvolume-client-1<br>
&gt;&gt;  32: end-volume<br>
&gt;&gt;  33:<br>
&gt;&gt;  34: volume glustershd<br>
&gt;&gt;  35:     type debug/io-stats<br>
&gt;&gt;  36:     subvolumes gfsvolume-replicate-0<br>
&gt;&gt;  37: end-volume<br>
&gt;&gt;<br>
&gt;&gt; +------------------------------------------------------------------------------+<br>
&gt;&gt; [2015-03-13 02:09:41.961871] I [rpc-clnt.c:1685:rpc_clnt_reconfig]<br>
&gt;&gt; 0-gfsvolume-client-1: changing port to 49153 (from 0)<br>
&gt;&gt; [2015-03-13 02:09:41.962129] I<br>
&gt;&gt; [client-handshake.c:1659:select_server_supported_programs]<br>
&gt;&gt; 0-gfsvolume-client-1: Using Program GlusterFS 3.3, Num (1298437), Version<br>
&gt;&gt; (330)<br>
&gt;&gt; [2015-03-13 02:09:41.962344] I<br>
&gt;&gt; [client-handshake.c:1456:client_setvolume_cbk] 0-gfsvolume-client-1:<br>
&gt;&gt; Connected to <a href="http://172.20.20.22:49153" target="_blank">172.20.20.22:49153</a>, attached to remote volume<br>
&gt;&gt; &#39;/export/sda/brick&#39;.<br>
&gt;&gt; [2015-03-13 02:09:41.962363] I<br>
&gt;&gt; [client-handshake.c:1468:client_setvolume_cbk] 0-gfsvolume-client-1: Server<br>
&gt;&gt; and Client lk-version numbers are not same, reopening the fds<br>
&gt;&gt; [2015-03-13 02:09:41.962416] I [afr-common.c:3922:afr_notify]<br>
&gt;&gt; 0-gfsvolume-replicate-0: Subvolume &#39;gfsvolume-client-1&#39; came back up; going<br>
&gt;&gt; online.<br>
&gt;&gt; [2015-03-13 02:09:41.962487] I<br>
&gt;&gt; [client-handshake.c:450:client_set_lk_version_cbk] 0-gfsvolume-client-1:<br>
&gt;&gt; Server lk version = 1<br>
&gt;&gt; [2015-03-13 02:09:41.963109] E<br>
&gt;&gt; [afr-self-heald.c:1479:afr_find_child_position] 0-gfsvolume-replicate-0:<br>
&gt;&gt; getxattr failed on gfsvolume-client-0 - (Transport endpoint is not<br>
&gt;&gt; connected)<br>
&gt;&gt; [2015-03-13 02:09:41.963502] I<br>
&gt;&gt; [afr-self-heald.c:1687:afr_dir_exclusive_crawl] 0-gfsvolume-replicate-0:<br>
&gt;&gt; Another crawl is in progress for gfsvolume-client-1<br>
&gt;&gt; [2015-03-13 02:09:41.967478] E<br>
&gt;&gt; [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]<br>
&gt;&gt; 0-gfsvolume-replicate-0: Non Blocking entrylks failed for<br>
&gt;&gt; &lt;gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9&gt;.<br>
&gt;&gt; [2015-03-13 02:09:41.968550] E<br>
&gt;&gt; [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]<br>
&gt;&gt; 0-gfsvolume-replicate-0: Non Blocking entrylks failed for<br>
&gt;&gt; &lt;gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e&gt;.<br>
&gt;&gt; [2015-03-13 02:09:41.969663] E<br>
&gt;&gt; [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]<br>
&gt;&gt; 0-gfsvolume-replicate-0: Non Blocking entrylks failed for<br>
&gt;&gt; &lt;gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84&gt;.<br>
&gt;&gt; [2015-03-13 02:09:41.974345] E<br>
&gt;&gt; [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]<br>
&gt;&gt; 0-gfsvolume-replicate-0: Non Blocking entrylks failed for<br>
&gt;&gt; &lt;gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9&gt;.<br>
&gt;&gt; [2015-03-13 02:09:41.975657] E<br>
&gt;&gt; [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]<br>
&gt;&gt; 0-gfsvolume-replicate-0: Non Blocking entrylks failed for<br>
&gt;&gt; &lt;gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e&gt;.<br>
&gt;&gt; [2015-03-13 02:09:41.977020] E<br>
&gt;&gt; [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]<br>
&gt;&gt; 0-gfsvolume-replicate-0: Non Blocking entrylks failed for<br>
&gt;&gt; &lt;gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84&gt;.<br>
&gt;&gt; [2015-03-13 02:09:44.307219] I [rpc-clnt.c:1685:rpc_clnt_reconfig]<br>
&gt;&gt; 0-gfsvolume-client-0: changing port to 49153 (from 0)<br>
&gt;&gt; [2015-03-13 02:09:44.307748] I<br>
&gt;&gt; [client-handshake.c:1659:select_server_supported_programs]<br>
&gt;&gt; 0-gfsvolume-client-0: Using Program GlusterFS 3.3, Num (1298437), Version<br>
&gt;&gt; (330)<br>
&gt;&gt; [2015-03-13 02:09:44.448377] I<br>
&gt;&gt; [client-handshake.c:1456:client_setvolume_cbk] 0-gfsvolume-client-0:<br>
&gt;&gt; Connected to <a href="http://172.20.20.21:49153" target="_blank">172.20.20.21:49153</a>, attached to remote volume<br>
&gt;&gt; &#39;/export/sda/brick&#39;.<br>
&gt;&gt; [2015-03-13 02:09:44.448418] I<br>
&gt;&gt; [client-handshake.c:1468:client_setvolume_cbk] 0-gfsvolume-client-0: Server<br>
&gt;&gt; and Client lk-version numbers are not same, reopening the fds<br>
&gt;&gt; [2015-03-13 02:09:44.448713] I<br>
&gt;&gt; [client-handshake.c:450:client_set_lk_version_cbk] 0-gfsvolume-client-0:<br>
&gt;&gt; Server lk version = 1<br>
&gt;&gt; [2015-03-13 02:09:44.515112] I<br>
&gt;&gt; [afr-self-heal-common.c:2859:afr_log_self_heal_completion_status]<br>
&gt;&gt; 0-gfsvolume-replicate-0:  foreground data self heal  is successfully<br>
&gt;&gt; completed,  data self heal from gfsvolume-client-0  to sinks<br>
&gt;&gt;  gfsvolume-client-1, with 892928 bytes on gfsvolume-client-0, 892928 bytes<br>
&gt;&gt; on gfsvolume-client-1,  data - Pending matrix:  [ [ 0 155762 ] [ 0 0 ] ]<br>
&gt;&gt;  on &lt;gfid:123536cc-c34b-43d7-b0c6-cf80eefa8322&gt;<br>
&gt;&gt; [2015-03-13 02:09:44.809988] I<br>
&gt;&gt; [afr-self-heal-common.c:2859:afr_log_self_heal_completion_status]<br>
&gt;&gt; 0-gfsvolume-replicate-0:  foreground data self heal  is successfully<br>
&gt;&gt; completed,  data self heal from gfsvolume-client-0  to sinks<br>
&gt;&gt;  gfsvolume-client-1, with 15998976 bytes on gfsvolume-client-0, 15998976<br>
&gt;&gt; bytes on gfsvolume-client-1,  data - Pending matrix:  [ [ 0 36506 ] [ 0 0 ]<br>
&gt;&gt; ]  on &lt;gfid:b6dc0e74-31bf-469a-b629-ee51ab4cf729&gt;<br>
&gt;&gt; [2015-03-13 02:09:44.946050] W<br>
&gt;&gt; [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: remote<br>
&gt;&gt; operation failed: Stale NFS file handle<br>
&gt;&gt; [2015-03-13 02:09:44.946097] I<br>
&gt;&gt; [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk]<br>
&gt;&gt; 0-gfsvolume-replicate-0: readlink of<br>
&gt;&gt; &lt;gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9&gt;/PB2_corrected.fastq on<br>
&gt;&gt; gfsvolume-client-1 failed (Stale NFS file handle)<br>
&gt;&gt; [2015-03-13 02:09:44.951370] I<br>
&gt;&gt; [afr-self-heal-entry.c:2321:afr_sh_entry_fix] 0-gfsvolume-replicate-0:<br>
&gt;&gt; &lt;gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e&gt;: Performing conservative merge<br>
&gt;&gt; [2015-03-13 02:09:45.149995] W<br>
&gt;&gt; [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: remote<br>
&gt;&gt; operation failed: Stale NFS file handle<br>
&gt;&gt; [2015-03-13 02:09:45.150036] I<br>
&gt;&gt; [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk]<br>
&gt;&gt; 0-gfsvolume-replicate-0: readlink of<br>
&gt;&gt; &lt;gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e&gt;/Rscript on gfsvolume-client-1<br>
&gt;&gt; failed (Stale NFS file handle)<br>
&gt;&gt; [2015-03-13 02:09:45.214253] W<br>
&gt;&gt; [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: remote<br>
&gt;&gt; operation failed: Stale NFS file handle<br>
&gt;&gt; [2015-03-13 02:09:45.214295] I<br>
&gt;&gt; [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk]<br>
&gt;&gt; 0-gfsvolume-replicate-0: readlink of<br>
&gt;&gt; &lt;gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84&gt;/ananas_d_tmp on<br>
&gt;&gt; gfsvolume-client-1 failed (Stale NFS file handle)<br>
&gt;&gt; [2015-03-13 02:13:27.324856] W [socket.c:522:__socket_rwv]<br>
&gt;&gt; 0-gfsvolume-client-1: readv on <a href="http://172.20.20.22:49153" target="_blank">172.20.20.22:49153</a> failed (No data<br>
&gt;&gt; available)<br>
&gt;&gt; [2015-03-13 02:13:27.324961] I [client.c:2208:client_rpc_notify]<br>
&gt;&gt; 0-gfsvolume-client-1: disconnected from <a href="http://172.20.20.22:49153" target="_blank">172.20.20.22:49153</a>. Client<br>
&gt;&gt; process will keep trying to connect to glusterd until brick&#39;s port is<br>
&gt;&gt; available<br>
&gt;&gt; [2015-03-13 02:13:37.981531] I [rpc-clnt.c:1685:rpc_clnt_reconfig]<br>
&gt;&gt; 0-gfsvolume-client-1: changing port to 49153 (from 0)<br>
&gt;&gt; [2015-03-13 02:13:37.981781] E [socket.c:2161:socket_connect_finish]<br>
&gt;&gt; 0-gfsvolume-client-1: connection to <a href="http://172.20.20.22:49153" target="_blank">172.20.20.22:49153</a> failed (Connection<br>
&gt;&gt; refused)<br>
&gt;&gt; [2015-03-13 02:13:41.982125] I [rpc-clnt.c:1685:rpc_clnt_reconfig]<br>
&gt;&gt; 0-gfsvolume-client-1: changing port to 49153 (from 0)<br>
&gt;&gt; [2015-03-13 02:13:41.982353] E [socket.c:2161:socket_connect_finish]<br>
&gt;&gt; 0-gfsvolume-client-1: connection to <a href="http://172.20.20.22:49153" target="_blank">172.20.20.22:49153</a> failed (Connection<br>
&gt;&gt; refused)<br>
&gt;&gt; [2015-03-13 02:13:45.982693] I [rpc-clnt.c:1685:rpc_clnt_reconfig]<br>
&gt;&gt; 0-gfsvolume-client-1: changing port to 49153 (from 0)<br>
&gt;&gt; [2015-03-13 02:13:45.982926] E [socket.c:2161:socket_connect_finish]<br>
&gt;&gt; 0-gfsvolume-client-1: connection to <a href="http://172.20.20.22:49153" target="_blank">172.20.20.22:49153</a> failed (Connection<br>
&gt;&gt; refused)<br>
&gt;&gt; [2015-03-13 02:13:49.983309] I [rpc-clnt.c:1685:rpc_clnt_reconfig]<br>
&gt;&gt; 0-gfsvolume-client-1: changing port to 49153 (from 0)<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;<br>
&gt; Any help would be greatly appreciated.<br>
&gt; Thank You Kindly,<br>
&gt; Kaamesh<br>
&gt;<br>
&gt;<br>
&gt;<br>
</div></div>&gt; _______________________________________________<br>
&gt; Gluster-users mailing list<br>
&gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
&gt; <a href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
&gt;<br>
<br>
<br>
</blockquote></div><br></div></div>