<div dir="ltr">Hi Anuradha!<div><br></div><div>Thanks for your reply! Attached you can find the dump files. As I'm not sure if they make their way through as attachments, here're links to them as well:</div><div><br></div><div><div>brick1 - <a href="http://pastebin.com/3ivkhuRH">http://pastebin.com/3ivkhuRH</a></div><div>brick2 - <a href="http://pastebin.com/77sT1mut">http://pastebin.com/77sT1mut</a></div></div><div><br></div><div>- Andreas</div><div><br></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div style="font-family:arial;font-size:small"><br></div><div style="font-family:arial;font-size:small"><br></div></div></div></div></div></div></div></div></div></div></div><div class="gmail_quote">On Thu, Sep 24, 2015 at 3:18 PM, Anuradha Talur <span dir="ltr"><<a href="mailto:atalur@redhat.com" target="_blank">atalur@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5"><br>
<br>
----- Original Message -----<br>
> From: "Andreas Mather" <<a href="mailto:andreas@allaboutapps.at">andreas@allaboutapps.at</a>><br>
> To: "<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a> List" <<a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>><br>
> Sent: Thursday, September 24, 2015 1:24:12 PM<br>
> Subject: [Gluster-users] gluster 3.7.3 - volume heal info hangs - unknown heal status<br>
><br>
> Hi!<br>
><br>
> Our provider had network maintenance this night, so 2 of our 4 servers got<br>
> disconnected and reconnected. Since we knew this was coming, we shifted all<br>
> work load off the affected servers. This morning, most of the cluster seems<br>
> fine, but for one volume, no heal info can be retrieved, so we basically<br>
> don't know about the healing state of the volume. The volume is a replica 2<br>
> volume between vhost4-int/brick1 and vhost3-int/brick2.<br>
><br>
> The volume is accessible, but since I don't get any heal info, I don't know<br>
> if it is probably replicated. Any help to resolve this situation is highly<br>
> appreciated.<br>
><br>
> hangs forever:<br>
> [root@vhost4 ~]# gluster volume heal vol4 info<br>
><br>
> glfsheal-vol4.log:<br>
> [2015-09-24 07:47:59.<a href="tel:284723" value="+43284723">284723</a>] I [MSGID: 101190]<br>
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with<br>
> index 1<br>
> [2015-09-24 07:47:59.293735] I [MSGID: 101190]<br>
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with<br>
> index 2<br>
> [2015-09-24 07:47:59.294061] I [MSGID: 104045] [glfs-master.c:95:notify]<br>
> 0-gfapi: New graph 76686f73-7434-2e61-6c6c-61626f757461 (0) coming up<br>
> [2015-09-24 07:47:59.294081] I [MSGID: 114020] [client.c:2118:notify]<br>
> 0-vol4-client-1: parent translators are ready, attempting connect on<br>
> transport<br>
> [2015-09-24 07:47:59.309470] I [MSGID: 114020] [client.c:2118:notify]<br>
> 0-vol4-client-2: parent translators are ready, attempting connect on<br>
> transport<br>
> [2015-09-24 07:47:59.310525] I [rpc-clnt.c:1819:rpc_clnt_reconfig]<br>
> 0-vol4-client-1: changing port to 49155 (from 0)<br>
> [2015-09-24 07:47:59.315958] I [MSGID: 114057]<br>
> [client-handshake.c:1437:select_server_supported_programs] 0-vol4-client-1:<br>
> Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>
> [2015-09-24 07:47:59.316481] I [MSGID: 114046]<br>
> [client-handshake.c:1213:client_setvolume_cbk] 0-vol4-client-1: Connected to<br>
> vol4-client-1, attached to remote volume '/storage/brick2/brick2'.<br>
> [2015-09-24 07:47:59.316495] I [MSGID: 114047]<br>
> [client-handshake.c:1224:client_setvolume_cbk] 0-vol4-client-1: Server and<br>
> Client lk-version numbers are not same, reopening the fds<br>
> [2015-09-24 07:47:59.316538] I [MSGID: 108005] [afr-common.c:3960:afr_notify]<br>
> 0-vol4-replicate-0: Subvolume 'vol4-client-1' came back up; going online.<br>
> [2015-09-24 07:47:59.317150] I [MSGID: 114035]<br>
> [client-handshake.c:193:client_set_lk_version_cbk] 0-vol4-client-1: Server<br>
> lk version = 1<br>
> [2015-09-24 07:47:59.320898] I [rpc-clnt.c:1819:rpc_clnt_reconfig]<br>
> 0-vol4-client-2: changing port to 49154 (from 0)<br>
> [2015-09-24 07:47:59.325633] I [MSGID: 114057]<br>
> [client-handshake.c:1437:select_server_supported_programs] 0-vol4-client-2:<br>
> Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>
> [2015-09-24 07:47:59.325780] I [MSGID: 114046]<br>
> [client-handshake.c:1213:client_setvolume_cbk] 0-vol4-client-2: Connected to<br>
> vol4-client-2, attached to remote volume '/storage/brick1/brick1'.<br>
> [2015-09-24 07:47:59.325791] I [MSGID: 114047]<br>
> [client-handshake.c:1224:client_setvolume_cbk] 0-vol4-client-2: Server and<br>
> Client lk-version numbers are not same, reopening the fds<br>
> [2015-09-24 07:47:59.333346] I [MSGID: 114035]<br>
> [client-handshake.c:193:client_set_lk_version_cbk] 0-vol4-client-2: Server<br>
> lk version = 1<br>
> [2015-09-24 07:47:59.334545] I [MSGID: 108031]<br>
> [afr-common.c:1745:afr_local_discovery_cbk] 0-vol4-replicate-0: selecting<br>
> local read_child vol4-client-2<br>
> [2015-09-24 07:47:59.335833] I [MSGID: 104041]<br>
> [glfs-resolve.c:862:__glfs_active_subvol] 0-vol4: switched to graph<br>
> 76686f73-7434-2e61-6c6c-61626f757461 (0)<br>
><br>
> Questions to this output:<br>
> -) Why does it report " Using Program GlusterFS 3.3, Num (1298437), Version<br>
> (330) ". We run 3.7.3 ?!<br>
> -) guster logs timestamps in UTC not taking server timezone into account. Is<br>
> there a way to fix this?<br>
><br>
> etc-glusterfs-glusterd.vol.log:<br>
> no logs to after volume heal info command<br>
><br>
> storage-brick1-brick1.log:<br>
> [2015-09-24 07:47:59.325720] I [login.c:81:gf_auth] 0-auth/login: allowed<br>
> user names: 67ef1559-d3a1-403a-b8e7-fb145c3acf4e<br>
> [2015-09-24 07:47:59.325743] I [MSGID: 115029]<br>
> [server-handshake.c:610:server_setvolume] 0-vol4-server: accepted client<br>
> from<br>
> vhost4.allaboutapps.at-14900-2015/09/24-07:47:59:282313-vol4-client-2-0-0<br>
> (version: 3.7.3)<br>
><br>
> storage-brick2-brick2.log:<br>
> no logs to after volume heal info command<br>
><br>
><br>
</div></div>Hi Andreas,<br>
<br>
Could you please provide the following information so that we can understand why the command is hanging?<br>
When the command is hung, run the following command from one of the servers:<br>
`gluster volume statedump <volname>`<br>
This command will generate statedumps of glusterfsd processes in the servers. You can find them at /var/run/gluster . A typical statedump for a brick has "<brick-path>.<pid-of-brick>.dump.<timestamp>" as its name. Could you please attach them and respond?<br>
<br>
> Thanks,<br>
><br>
> - Andreas<br>
><br>
><br>
><br>
> _______________________________________________<br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
> <a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
<span class=""><font color="#888888"><br>
--<br>
Thanks,<br>
Anuradha.<br>
</font></span></blockquote></div><br></div></div>