<div dir="ltr"><div><div>I have studied information on page:<br><a href="https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md">https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md</a><br>and cannot solve split-brain by this instruction.<br><br>I have tested it on gluster 3.6 and it doesn&#39;t work, only on gluster 3.7.<br><br>I try to use on gluster 3.7.2.<br>I have a gluster share in replicate mode:<br>root@dist-gl2:/# gluster volume info<br><br>Volume Name: repofiles<br>Type: Replicate<br>Volume ID: 1d5d5d7d-39f2-4011-9fc8-d73c29495e7c<br>Status: Started<br>Number of Bricks: 1 x 2 = 2<br>Transport-type: tcp<br>Bricks:<br>Brick1: dist-gl1:/brick1<br>Brick2: dist-gl2:/brick1<br>Options Reconfigured:<br>performance.readdir-ahead: on<br>server.allow-insecure: on<br>root@dist-gl2:/#<br><br>And I have one file in split-brain (it is file &quot;test&quot;):<br>root@dist-gl2:/# gluster volume heal repofiles info<br>Brick dist-gl1:/brick1/<br>/test <br>/ - Is in split-brain<br><br>Number of entries: 2<br><br>Brick dist-gl2:/brick1/<br>/ - Is in split-brain<br><br>/test <br>Number of entries: 2<br><br>root@dist-gl2:/# gluster volume heal repofiles info split-brain<br>Brick dist-gl1:/brick1/<br>/<br>Number of entries in split-brain: 1<br><br>Brick dist-gl2:/brick1/<br>/<br>Number of entries in split-brain: 1<br><br>root@dist-gl2:/# <br><br>I don&#39;t know why these commands show only directory (&quot;/&quot;) in split-brain.<br><br>I try solve split-brain by gluster cli commands (on directory from the output previous commands and on file), but it could not help:<br>root@dist-gl2:/# gluster v heal repofiles split-brain bigger-file /<br>Healing / failed:Operation not permitted.<br>Volume heal failed.<br>root@dist-gl2:/# gluster v heal repofiles split-brain bigger-file /test<br>Lookup failed on /test:Input/output error<br>Volume heal failed.<br>root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl1:/brick1 /<br>Healing / failed:Operation not permitted.<br>Volume heal failed.<br>root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl1:/brick1 /test<br>Lookup failed on /test:Input/output error<br>Volume heal failed.<br>root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl2:/brick1 /<br>Healing / failed:Operation not permitted.<br>Volume heal failed.<br>root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl2:/brick1 /test<br>Lookup failed on /test:Input/output error<br>Volume heal failed.<br>root@dist-gl2:/# <br><br>Parts of glfsheal-repofiles.log logs.<br>When try to solve split-brain on dirictory (&quot;/&quot;):<br>[2015-07-15 19:45:30.508670] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1<br>[2015-07-15 19:45:30.516662] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2<br>[2015-07-15 19:45:30.517201] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 64697374-2d67-6c32-2d32-303634362d32 (0) coming up<br>[2015-07-15 19:45:30.517227] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-0: parent translators are ready, attempting connect on transport<br>[2015-07-15 19:45:30.525457] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-1: parent translators are ready, attempting connect on transport<br>[2015-07-15 19:45:30.526788] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-0: changing port to 49152 (from 0)<br>[2015-07-15 19:45:30.534012] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-1: changing port to 49152 (from 0)<br>[2015-07-15 19:45:30.536252] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>[2015-07-15 19:45:30.536606] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-0: Connected to repofiles-client-0, attached to remote volume &#39;/brick1&#39;.<br>[2015-07-15 19:45:30.536621] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-0: Server and Client lk-version numbers are not same, reopening the fds<br>[2015-07-15 19:45:30.536679] I [MSGID: 108005] [afr-common.c:3883:afr_notify] 0-repofiles-replicate-0: Subvolume &#39;repofiles-client-0&#39; came back up; going online.<br>[2015-07-15 19:45:30.536819] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-0: Server lk version = 1<br>[2015-07-15 19:45:30.543712] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>[2015-07-15 19:45:30.543919] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-1: Connected to repofiles-client-1, attached to remote volume &#39;/brick1&#39;.<br>[2015-07-15 19:45:30.543933] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-1: Server and Client lk-version numbers are not same, reopening the fds<br>[2015-07-15 19:45:30.554650] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-1: Server lk version = 1<br>[2015-07-15 19:45:30.557628] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-repofiles-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001<br>[2015-07-15 19:45:30.560002] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-repofiles-replicate-0: Gfid mismatch detected for &lt;00000000-0000-0000-0000-000000000001/test&gt;, e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0. Skipping conservative merge on the file.<br>[2015-07-15 19:45:30.561582] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for /<br>[2015-07-15 19:45:30.561604] I [afr-common.c:1673:afr_local_discovery_cbk] 0-repofiles-replicate-0: selecting local read_child repofiles-client-1<br>[2015-07-15 19:45:30.561900] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for /<br>[2015-07-15 19:45:30.561962] I [MSGID: 104041] [glfs-resolve.c:843:__glfs_active_subvol] 0-repofiles: switched to graph 64697374-2d67-6c32-2d32-303634362d32 (0)<br>[2015-07-15 19:45:30.562259] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for /<br>[2015-07-15 19:45:32.563285] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for /<br>[2015-07-15 19:45:32.564898] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-repofiles-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001<br>[2015-07-15 19:45:32.566693] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-repofiles-replicate-0: Gfid mismatch detected for &lt;00000000-0000-0000-0000-000000000001/test&gt;, e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0. Skipping conservative merge on the file.<br>When try to solve split-brain on file (&quot;/test&quot;):<br>[2015-07-15 19:48:45.910819] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1<br>[2015-07-15 19:48:45.919854] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2<br>[2015-07-15 19:48:45.920434] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 64697374-2d67-6c32-2d32-313133392d32 (0) coming up<br>[2015-07-15 19:48:45.920481] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-0: parent translators are ready, attempting connect on transport<br>[2015-07-15 19:48:45.996442] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-1: parent translators are ready, attempting connect on transport<br>[2015-07-15 19:48:45.997892] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-0: changing port to 49152 (from 0)<br>[2015-07-15 19:48:46.005153] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-1: changing port to 49152 (from 0)<br>[2015-07-15 19:48:46.007437] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>[2015-07-15 19:48:46.007928] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-0: Connected to repofiles-client-0, attached to remote volume &#39;/brick1&#39;.<br>[2015-07-15 19:48:46.007945] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-0: Server and Client lk-version numbers are not same, reopening the fds<br>[2015-07-15 19:48:46.008020] I [MSGID: 108005] [afr-common.c:3883:afr_notify] 0-repofiles-replicate-0: Subvolume &#39;repofiles-client-0&#39; came back up; going online.<br>[2015-07-15 19:48:46.008189] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-0: Server lk version = 1<br>[2015-07-15 19:48:46.014313] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>[2015-07-15 19:48:46.014536] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-1: Connected to repofiles-client-1, attached to remote volume &#39;/brick1&#39;.<br>[2015-07-15 19:48:46.014550] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-1: Server and Client lk-version numbers are not same, reopening the fds<br>[2015-07-15 19:48:46.026828] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-1: Server lk version = 1<br>[2015-07-15 19:48:46.029357] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-repofiles-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001<br>[2015-07-15 19:48:46.031719] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-repofiles-replicate-0: Gfid mismatch detected for &lt;00000000-0000-0000-0000-000000000001/test&gt;, e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0. Skipping conservative merge on the file.<br>[2015-07-15 19:48:46.033222] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for /<br>[2015-07-15 19:48:46.033224] I [afr-common.c:1673:afr_local_discovery_cbk] 0-repofiles-replicate-0: selecting local read_child repofiles-client-1<br>[2015-07-15 19:48:46.033569] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for /<br>[2015-07-15 19:48:46.033624] I [MSGID: 104041] [glfs-resolve.c:843:__glfs_active_subvol] 0-repofiles: switched to graph 64697374-2d67-6c32-2d32-313133392d32 (0)<br>[2015-07-15 19:48:46.033906] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for /<br>[2015-07-15 19:48:48.036482] W [MSGID: 108008] [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-repofiles-replicate-0: GFID mismatch for &lt;gfid:00000000-0000-0000-0000-000000000001&gt;/test e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0<br><br>Where I did mistake when try solve split-brain?<br><br></div>Best regards,<br></div>Igor<br></div><div class="gmail_extra"><br><div class="gmail_quote">2015-07-14 22:11 GMT+03:00 Roman <span dir="ltr">&lt;<a href="mailto:romeo.r@gmail.com" target="_blank">romeo.r@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">never mind. I do not have enough time to debug why basic commands of gluster do not work on production server. It was enough of tonight&#39;s system freeze due to not documented XFS settings MUST have to run glusterfs with XFS. I&#39;ll keep to EXT4 better. Anyway XFS for bricks did not solve my previous problem.<div><br></div><div>To solve split-brain this time, I&#39;ve restored VM from backup.</div></div><div class="gmail_extra"><div><div class="h5"><br><div class="gmail_quote">2015-07-14 21:55 GMT+03:00 Roman <span dir="ltr">&lt;<a href="mailto:romeo.r@gmail.com" target="_blank">romeo.r@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span><div dir="ltr" style="font-size:12.8000001907349px">Thanx for pointing out... <div>but doesn&#39;t seem to work... or i am too sleepy due to problems with glusterfs and debian8 in other topic which i&#39;m fighting for month..</div><div><br></div><div><div>root@stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster split-brain source-brick stor1:HA-2TB-TT-Proxmox-cluster/2TB /images/124/vm-124-disk-1.qcow2</div><div>Usage: volume heal &lt;VOLNAME&gt; [{full | statistics {heal-count {replica &lt;hostname:brickname&gt;}} |info {healed | heal-failed | split-brain}}]</div></div><div><br></div><div>seems like wrong command...</div></div></span><div style="font-size:12.8000001907349px"><div><img src="https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif"></div></div></div><div class="gmail_extra"><br><div class="gmail_quote"><span>2015-07-14 21:23 GMT+03:00 Joe Julian <span dir="ltr">&lt;<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>&gt;</span>:<br></span><div><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 07/14/2015 11:19 AM, Roman wrote:<br>
</span><div><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
played with glusterfs tonight and tried to use recommended XFS for gluster.. first try was pretty bad and all of my VM-s hanged (XFS wants allocsize=64k to create qcow2 files, which i didn&#39;t know about and tried to create VM on XFS without this config line in fstab, which lead to a lot of IO-s and qemu says it got time out while creating the file)..<br>
<br>
now i&#39;ve got this:<br>
Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/<br>
/images/124/vm-124-disk-1.qcow2 - Is in split-brain<br>
<br>
Number of entries: 1<br>
<br>
Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/<br>
/images/124/vm-124-disk-1.qcow2 - Is in split-brain<br>
<br>
ok. what next?<br>
I&#39;ve deleted one of files, it didn&#39;t help. even more, selfheal restored the file on node, where i deleted it... and still split-brain.<br>
<br>
how to fix?<br>
<br>
-- <br>
Best regards,<br>
Roman.<br>
<br>
</blockquote>
<br>
<br>
</div></div><span><a href="https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md" rel="noreferrer" target="_blank">https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md</a> <br>
<br>
or<br>
<br>
<a href="https://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/" rel="noreferrer" target="_blank">https://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/</a><br></span>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote></div></div></div><span><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div>Best regards,<br>Roman.</div>
</font></span></div>
</blockquote></div><br><br clear="all"><div><br></div></div></div><span class="HOEnZb"><font color="#888888">-- <br><div>Best regards,<br>Roman.</div>
</font></span></div>
<br>_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br></blockquote></div><br></div>