<div dir="ltr">Hi!<div><br></div><div>It&#39;s VMs based on KVM/qemu managed by libvirtd. I figured I could see the heal status by comparing the bricks: nothing was replicated, but new files were (after a long delay of about 5 mins). So I wanted to see if existing files (VM images) will be healed if I would stop a VM (close any open handle on the file), which turned out not to be the case.</div><div><br></div><div>I ended up shutting down all VMs and restarting the server. Afterwards healing worked as expected....</div><div><br></div><div>- Andreas</div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 5, 2015 at 1:01 PM, Anuradha Talur <span dir="ltr">&lt;<a href="mailto:atalur@redhat.com" target="_blank">atalur@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

<br>

----- Original Message -----<br>

&gt; From: &quot;Andreas Mather&quot; &lt;<a href="mailto:andreas@allaboutapps.at">andreas@allaboutapps.at</a>&gt;<br>

</span><span class="">&gt; To: &quot;Anuradha Talur&quot; &lt;<a href="mailto:atalur@redhat.com">atalur@redhat.com</a>&gt;<br>

&gt; Cc: &quot;<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a> List&quot; &lt;<a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>&gt;<br>

&gt; Sent: Thursday, September 24, 2015 6:59:38 PM<br>

&gt; Subject: Re: [Gluster-users] gluster 3.7.3 - volume heal info hangs - unknown heal status<br>

&gt;<br>

&gt; Hi Anuradha!<br>

&gt;<br>

&gt; Thanks for your reply! Attached you can find the dump files. As I&#39;m not<br>

&gt; sure if they make their way through as attachments, here&#39;re links to them<br>

&gt; as well:<br>

&gt;<br>

&gt; brick1 - <a href="http://pastebin.com/3ivkhuRH" rel="noreferrer" target="_blank">http://pastebin.com/3ivkhuRH</a><br>

&gt; brick2 - <a href="http://pastebin.com/77sT1mut" rel="noreferrer" target="_blank">http://pastebin.com/77sT1mut</a><br>

</span>Hi,<br>

<br>

I see some blocked locks from the statedump.<br>

Could you let me know what kind of workload you had when you observed the hang?<br>

<div class="HOEnZb"><div class="h5">&gt;<br>

&gt; - Andreas<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; On Thu, Sep 24, 2015 at 3:18 PM, Anuradha Talur &lt;<a href="mailto:atalur@redhat.com">atalur@redhat.com</a>&gt; wrote:<br>

&gt;<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; ----- Original Message -----<br>

&gt; &gt; &gt; From: &quot;Andreas Mather&quot; &lt;<a href="mailto:andreas@allaboutapps.at">andreas@allaboutapps.at</a>&gt;<br>

&gt; &gt; &gt; To: &quot;<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a> List&quot; &lt;<a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>&gt;<br>

&gt; &gt; &gt; Sent: Thursday, September 24, 2015 1:24:12 PM<br>

&gt; &gt; &gt; Subject: [Gluster-users] gluster 3.7.3 - volume heal info hangs -<br>

&gt; &gt; unknown     heal status<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Hi!<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Our provider had network maintenance this night, so 2 of our 4 servers<br>

&gt; &gt; got<br>

&gt; &gt; &gt; disconnected and reconnected. Since we knew this was coming, we shifted<br>

&gt; &gt; all<br>

&gt; &gt; &gt; work load off the affected servers. This morning, most of the cluster<br>

&gt; &gt; seems<br>

&gt; &gt; &gt; fine, but for one volume, no heal info can be retrieved, so we basically<br>

&gt; &gt; &gt; don&#39;t know about the healing state of the volume. The volume is a<br>

&gt; &gt; replica 2<br>

&gt; &gt; &gt; volume between vhost4-int/brick1 and vhost3-int/brick2.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; The volume is accessible, but since I don&#39;t get any heal info, I don&#39;t<br>

&gt; &gt; know<br>

&gt; &gt; &gt; if it is probably replicated. Any help to resolve this situation is<br>

&gt; &gt; highly<br>

&gt; &gt; &gt; appreciated.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; hangs forever:<br>

&gt; &gt; &gt; [root@vhost4 ~]# gluster volume heal vol4 info<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; glfsheal-vol4.log:<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.<a href="tel:284723" value="+43284723">284723</a>] I [MSGID: 101190]<br>

&gt; &gt; &gt; [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread<br>

&gt; &gt; with<br>

&gt; &gt; &gt; index 1<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.293735] I [MSGID: 101190]<br>

&gt; &gt; &gt; [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread<br>

&gt; &gt; with<br>

&gt; &gt; &gt; index 2<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.294061] I [MSGID: 104045] [glfs-master.c:95:notify]<br>

&gt; &gt; &gt; 0-gfapi: New graph 76686f73-7434-2e61-6c6c-61626f757461 (0) coming up<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.294081] I [MSGID: 114020] [client.c:2118:notify]<br>

&gt; &gt; &gt; 0-vol4-client-1: parent translators are ready, attempting connect on<br>

&gt; &gt; &gt; transport<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.309470] I [MSGID: 114020] [client.c:2118:notify]<br>

&gt; &gt; &gt; 0-vol4-client-2: parent translators are ready, attempting connect on<br>

&gt; &gt; &gt; transport<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.310525] I [rpc-clnt.c:1819:rpc_clnt_reconfig]<br>

&gt; &gt; &gt; 0-vol4-client-1: changing port to 49155 (from 0)<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.315958] I [MSGID: 114057]<br>

&gt; &gt; &gt; [client-handshake.c:1437:select_server_supported_programs]<br>

&gt; &gt; 0-vol4-client-1:<br>

&gt; &gt; &gt; Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.316481] I [MSGID: 114046]<br>

&gt; &gt; &gt; [client-handshake.c:1213:client_setvolume_cbk] 0-vol4-client-1:<br>

&gt; &gt; Connected to<br>

&gt; &gt; &gt; vol4-client-1, attached to remote volume &#39;/storage/brick2/brick2&#39;.<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.316495] I [MSGID: 114047]<br>

&gt; &gt; &gt; [client-handshake.c:1224:client_setvolume_cbk] 0-vol4-client-1: Server<br>

&gt; &gt; and<br>

&gt; &gt; &gt; Client lk-version numbers are not same, reopening the fds<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.316538] I [MSGID: 108005]<br>

&gt; &gt; [afr-common.c:3960:afr_notify]<br>

&gt; &gt; &gt; 0-vol4-replicate-0: Subvolume &#39;vol4-client-1&#39; came back up; going online.<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.317150] I [MSGID: 114035]<br>

&gt; &gt; &gt; [client-handshake.c:193:client_set_lk_version_cbk] 0-vol4-client-1:<br>

&gt; &gt; Server<br>

&gt; &gt; &gt; lk version = 1<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.320898] I [rpc-clnt.c:1819:rpc_clnt_reconfig]<br>

&gt; &gt; &gt; 0-vol4-client-2: changing port to 49154 (from 0)<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.325633] I [MSGID: 114057]<br>

&gt; &gt; &gt; [client-handshake.c:1437:select_server_supported_programs]<br>

&gt; &gt; 0-vol4-client-2:<br>

&gt; &gt; &gt; Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.325780] I [MSGID: 114046]<br>

&gt; &gt; &gt; [client-handshake.c:1213:client_setvolume_cbk] 0-vol4-client-2:<br>

&gt; &gt; Connected to<br>

&gt; &gt; &gt; vol4-client-2, attached to remote volume &#39;/storage/brick1/brick1&#39;.<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.325791] I [MSGID: 114047]<br>

&gt; &gt; &gt; [client-handshake.c:1224:client_setvolume_cbk] 0-vol4-client-2: Server<br>

&gt; &gt; and<br>

&gt; &gt; &gt; Client lk-version numbers are not same, reopening the fds<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.333346] I [MSGID: 114035]<br>

&gt; &gt; &gt; [client-handshake.c:193:client_set_lk_version_cbk] 0-vol4-client-2:<br>

&gt; &gt; Server<br>

&gt; &gt; &gt; lk version = 1<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.334545] I [MSGID: 108031]<br>

&gt; &gt; &gt; [afr-common.c:1745:afr_local_discovery_cbk] 0-vol4-replicate-0: selecting<br>

&gt; &gt; &gt; local read_child vol4-client-2<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.335833] I [MSGID: 104041]<br>

&gt; &gt; &gt; [glfs-resolve.c:862:__glfs_active_subvol] 0-vol4: switched to graph<br>

&gt; &gt; &gt; 76686f73-7434-2e61-6c6c-61626f757461 (0)<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Questions to this output:<br>

&gt; &gt; &gt; -) Why does it report &quot; Using Program GlusterFS 3.3, Num (1298437),<br>

&gt; &gt; Version<br>

&gt; &gt; &gt; (330) &quot;. We run 3.7.3 ?!<br>

&gt; &gt; &gt; -) guster logs timestamps in UTC not taking server timezone into<br>

&gt; &gt; account. Is<br>

&gt; &gt; &gt; there a way to fix this?<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; etc-glusterfs-glusterd.vol.log:<br>

&gt; &gt; &gt; no logs to after volume heal info command<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; storage-brick1-brick1.log:<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.325720] I [login.c:81:gf_auth] 0-auth/login: allowed<br>

&gt; &gt; &gt; user names: 67ef1559-d3a1-403a-b8e7-fb145c3acf4e<br>

&gt; &gt; &gt; [2015-09-24 07:47:59.325743] I [MSGID: 115029]<br>

&gt; &gt; &gt; [server-handshake.c:610:server_setvolume] 0-vol4-server: accepted client<br>

&gt; &gt; &gt; from<br>

&gt; &gt; &gt; vhost4.allaboutapps.at-14900-2015/09/24-07:47:59:282313-vol4-client-2-0-0<br>

&gt; &gt; &gt; (version: 3.7.3)<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; storage-brick2-brick2.log:<br>

&gt; &gt; &gt; no logs to after volume heal info command<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; Hi Andreas,<br>

&gt; &gt;<br>

&gt; &gt; Could you please provide the following information so that we can<br>

&gt; &gt; understand why the command is hanging?<br>

&gt; &gt; When the command is hung, run the following command from one of the<br>

&gt; &gt; servers:<br>

&gt; &gt; `gluster volume statedump &lt;volname&gt;`<br>

&gt; &gt; This command will generate statedumps of glusterfsd processes in the<br>

&gt; &gt; servers. You can find them at /var/run/gluster . A typical statedump for a<br>

&gt; &gt; brick has &quot;&lt;brick-path&gt;.&lt;pid-of-brick&gt;.dump.&lt;timestamp&gt;&quot; as its name. Could<br>

&gt; &gt; you please attach them and respond?<br>

&gt; &gt;<br>

&gt; &gt; &gt; Thanks,<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; - Andreas<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; Gluster-users mailing list<br>

&gt; &gt; &gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt; &gt; &gt; <a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

&gt; &gt;<br>

&gt; &gt; --<br>

&gt; &gt; Thanks,<br>

&gt; &gt; Anuradha.<br>

&gt; &gt;<br>

&gt;<br>

<br>

</div></div><span class="HOEnZb"><font color="#888888">--<br>

Thanks,<br>

Anuradha.<br>

</font></span></blockquote></div><br></div></div>