<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 07/16/2015 01:28 AM, Игорь Бирюлин
wrote:<br>
</div>
<blockquote
cite="mid:CAEtWxpyYxVdFxG=Hypih3_hRW6O_8=TkFSJmev=p9sCnmHwBFw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>I have studied information on page:<br>
<a moz-do-not-send="true"
href="https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md">https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md</a><br>
and cannot solve split-brain by this instruction.<br>
<br>
I have tested it on gluster 3.6 and it doesn't work, only on
gluster 3.7.<br>
<br>
</div>
</div>
</div>
</blockquote>
<br>
Right. We need to explicitly mention in the .md that it is supported
from 3.7 onwards. <br>
<br>
<blockquote
cite="mid:CAEtWxpyYxVdFxG=Hypih3_hRW6O_8=TkFSJmev=p9sCnmHwBFw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>I try to use on gluster 3.7.2.<br>
I have a gluster share in replicate mode:<br>
root@dist-gl2:/# gluster volume info<br>
<br>
Volume Name: repofiles<br>
Type: Replicate<br>
Volume ID: 1d5d5d7d-39f2-4011-9fc8-d73c29495e7c<br>
Status: Started<br>
Number of Bricks: 1 x 2 = 2<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: dist-gl1:/brick1<br>
Brick2: dist-gl2:/brick1<br>
Options Reconfigured:<br>
performance.readdir-ahead: on<br>
server.allow-insecure: on<br>
root@dist-gl2:/#<br>
<br>
And I have one file in split-brain (it is file "test"):<br>
root@dist-gl2:/# gluster volume heal repofiles info<br>
Brick dist-gl1:/brick1/<br>
/test <br>
/ - Is in split-brain<br>
<br>
Number of entries: 2<br>
<br>
Brick dist-gl2:/brick1/<br>
/ - Is in split-brain<br>
<br>
/test <br>
Number of entries: 2<br>
<br>
root@dist-gl2:/# gluster volume heal repofiles info
split-brain<br>
Brick dist-gl1:/brick1/<br>
/<br>
Number of entries in split-brain: 1<br>
<br>
Brick dist-gl2:/brick1/<br>
/<br>
Number of entries in split-brain: 1<br>
<br>
root@dist-gl2:/# <br>
<br>
I don't know why these commands show only directory ("/") in
split-brain.<br>
</div>
</div>
</div>
</blockquote>
<br>
That is because the file is in gfid split-brain. As listed in the
.md file, " for a gfid split-brain, the parent directory of the file
is shown to be in split-brain and the file itself is shown to be
needing heal". You cannot resolve gfid split-brains using the
commands. You need to resolve them manually. See "Fixing Directory
entry split-brain" in
<a class="moz-txt-link-freetext" href="https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md">https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md</a><br>
<br>
<blockquote
cite="mid:CAEtWxpyYxVdFxG=Hypih3_hRW6O_8=TkFSJmev=p9sCnmHwBFw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div><br>
I try solve split-brain by gluster cli commands (on
directory from the output previous commands and on file),
but it could not help:<br>
root@dist-gl2:/# gluster v heal repofiles split-brain
bigger-file /<br>
Healing / failed:Operation not permitted.<br>
Volume heal failed.<br>
root@dist-gl2:/# gluster v heal repofiles split-brain
bigger-file /test<br>
Lookup failed on /test:Input/output error<br>
Volume heal failed.<br>
root@dist-gl2:/# gluster v heal repofiles split-brain
source-brick dist-gl1:/brick1 /<br>
Healing / failed:Operation not permitted.<br>
Volume heal failed.<br>
root@dist-gl2:/# gluster v heal repofiles split-brain
source-brick dist-gl1:/brick1 /test<br>
Lookup failed on /test:Input/output error<br>
Volume heal failed.<br>
root@dist-gl2:/# gluster v heal repofiles split-brain
source-brick dist-gl2:/brick1 /<br>
Healing / failed:Operation not permitted.<br>
Volume heal failed.<br>
root@dist-gl2:/# gluster v heal repofiles split-brain
source-brick dist-gl2:/brick1 /test<br>
Lookup failed on /test:Input/output error<br>
Volume heal failed.<br>
root@dist-gl2:/# <br>
<br>
Parts of glfsheal-repofiles.log logs.<br>
When try to solve split-brain on dirictory ("/"):<br>
[2015-07-15 19:45:30.508670] I
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll:
Started thread with index 1<br>
[2015-07-15 19:45:30.516662] I
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll:
Started thread with index 2<br>
[2015-07-15 19:45:30.517201] I [MSGID: 104045]
[glfs-master.c:95:notify] 0-gfapi: New graph
64697374-2d67-6c32-2d32-303634362d32 (0) coming up<br>
[2015-07-15 19:45:30.517227] I [MSGID: 114020]
[client.c:2118:notify] 0-repofiles-client-0: parent
translators are ready, attempting connect on transport<br>
[2015-07-15 19:45:30.525457] I [MSGID: 114020]
[client.c:2118:notify] 0-repofiles-client-1: parent
translators are ready, attempting connect on transport<br>
[2015-07-15 19:45:30.526788] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-0:
changing port to 49152 (from 0)<br>
[2015-07-15 19:45:30.534012] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-1:
changing port to 49152 (from 0)<br>
[2015-07-15 19:45:30.536252] I [MSGID: 114057]
[client-handshake.c:1438:select_server_supported_programs]
0-repofiles-client-0: Using Program GlusterFS 3.3, Num
(1298437), Version (330)<br>
[2015-07-15 19:45:30.536606] I [MSGID: 114046]
[client-handshake.c:1214:client_setvolume_cbk]
0-repofiles-client-0: Connected to repofiles-client-0,
attached to remote volume '/brick1'.<br>
[2015-07-15 19:45:30.536621] I [MSGID: 114047]
[client-handshake.c:1225:client_setvolume_cbk]
0-repofiles-client-0: Server and Client lk-version numbers
are not same, reopening the fds<br>
[2015-07-15 19:45:30.536679] I [MSGID: 108005]
[afr-common.c:3883:afr_notify] 0-repofiles-replicate-0:
Subvolume 'repofiles-client-0' came back up; going online.<br>
[2015-07-15 19:45:30.536819] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk]
0-repofiles-client-0: Server lk version = 1<br>
[2015-07-15 19:45:30.543712] I [MSGID: 114057]
[client-handshake.c:1438:select_server_supported_programs]
0-repofiles-client-1: Using Program GlusterFS 3.3, Num
(1298437), Version (330)<br>
[2015-07-15 19:45:30.543919] I [MSGID: 114046]
[client-handshake.c:1214:client_setvolume_cbk]
0-repofiles-client-1: Connected to repofiles-client-1,
attached to remote volume '/brick1'.<br>
[2015-07-15 19:45:30.543933] I [MSGID: 114047]
[client-handshake.c:1225:client_setvolume_cbk]
0-repofiles-client-1: Server and Client lk-version numbers
are not same, reopening the fds<br>
[2015-07-15 19:45:30.554650] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk]
0-repofiles-client-1: Server lk version = 1<br>
[2015-07-15 19:45:30.557628] I
[afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-repofiles-replicate-0: performing entry selfheal on
00000000-0000-0000-0000-000000000001<br>
[2015-07-15 19:45:30.560002] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-repofiles-replicate-0: Gfid mismatch detected for
<00000000-0000-0000-0000-000000000001/test>,
e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1
and 16da3178-8a6e-4010-b874-7f11449d1993 on
repofiles-client-0. Skipping conservative merge on the file.<br>
[2015-07-15 19:45:30.561582] W
[afr-common.c:1985:afr_discover_done]
0-repofiles-replicate-0: no read subvols for /<br>
[2015-07-15 19:45:30.561604] I
[afr-common.c:1673:afr_local_discovery_cbk]
0-repofiles-replicate-0: selecting local read_child
repofiles-client-1<br>
[2015-07-15 19:45:30.561900] W
[afr-common.c:1985:afr_discover_done]
0-repofiles-replicate-0: no read subvols for /<br>
[2015-07-15 19:45:30.561962] I [MSGID: 104041]
[glfs-resolve.c:843:__glfs_active_subvol] 0-repofiles:
switched to graph 64697374-2d67-6c32-2d32-303634362d32 (0)<br>
[2015-07-15 19:45:30.562259] W
[afr-common.c:1985:afr_discover_done]
0-repofiles-replicate-0: no read subvols for /<br>
[2015-07-15 19:45:32.563285] W
[afr-common.c:1985:afr_discover_done]
0-repofiles-replicate-0: no read subvols for /<br>
[2015-07-15 19:45:32.564898] I
[afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-repofiles-replicate-0: performing entry selfheal on
00000000-0000-0000-0000-000000000001<br>
[2015-07-15 19:45:32.566693] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-repofiles-replicate-0: Gfid mismatch detected for
<00000000-0000-0000-0000-000000000001/test>,
e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1
and 16da3178-8a6e-4010-b874-7f11449d1993 on
repofiles-client-0. Skipping conservative merge on the file.<br>
When try to solve split-brain on file ("/test"):<br>
[2015-07-15 19:48:45.910819] I
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll:
Started thread with index 1<br>
[2015-07-15 19:48:45.919854] I
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll:
Started thread with index 2<br>
[2015-07-15 19:48:45.920434] I [MSGID: 104045]
[glfs-master.c:95:notify] 0-gfapi: New graph
64697374-2d67-6c32-2d32-313133392d32 (0) coming up<br>
[2015-07-15 19:48:45.920481] I [MSGID: 114020]
[client.c:2118:notify] 0-repofiles-client-0: parent
translators are ready, attempting connect on transport<br>
[2015-07-15 19:48:45.996442] I [MSGID: 114020]
[client.c:2118:notify] 0-repofiles-client-1: parent
translators are ready, attempting connect on transport<br>
[2015-07-15 19:48:45.997892] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-0:
changing port to 49152 (from 0)<br>
[2015-07-15 19:48:46.005153] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-1:
changing port to 49152 (from 0)<br>
[2015-07-15 19:48:46.007437] I [MSGID: 114057]
[client-handshake.c:1438:select_server_supported_programs]
0-repofiles-client-0: Using Program GlusterFS 3.3, Num
(1298437), Version (330)<br>
[2015-07-15 19:48:46.007928] I [MSGID: 114046]
[client-handshake.c:1214:client_setvolume_cbk]
0-repofiles-client-0: Connected to repofiles-client-0,
attached to remote volume '/brick1'.<br>
[2015-07-15 19:48:46.007945] I [MSGID: 114047]
[client-handshake.c:1225:client_setvolume_cbk]
0-repofiles-client-0: Server and Client lk-version numbers
are not same, reopening the fds<br>
[2015-07-15 19:48:46.008020] I [MSGID: 108005]
[afr-common.c:3883:afr_notify] 0-repofiles-replicate-0:
Subvolume 'repofiles-client-0' came back up; going online.<br>
[2015-07-15 19:48:46.008189] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk]
0-repofiles-client-0: Server lk version = 1<br>
[2015-07-15 19:48:46.014313] I [MSGID: 114057]
[client-handshake.c:1438:select_server_supported_programs]
0-repofiles-client-1: Using Program GlusterFS 3.3, Num
(1298437), Version (330)<br>
[2015-07-15 19:48:46.014536] I [MSGID: 114046]
[client-handshake.c:1214:client_setvolume_cbk]
0-repofiles-client-1: Connected to repofiles-client-1,
attached to remote volume '/brick1'.<br>
[2015-07-15 19:48:46.014550] I [MSGID: 114047]
[client-handshake.c:1225:client_setvolume_cbk]
0-repofiles-client-1: Server and Client lk-version numbers
are not same, reopening the fds<br>
[2015-07-15 19:48:46.026828] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk]
0-repofiles-client-1: Server lk version = 1<br>
[2015-07-15 19:48:46.029357] I
[afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-repofiles-replicate-0: performing entry selfheal on
00000000-0000-0000-0000-000000000001<br>
[2015-07-15 19:48:46.031719] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-repofiles-replicate-0: Gfid mismatch detected for
<00000000-0000-0000-0000-000000000001/test>,
e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1
and 16da3178-8a6e-4010-b874-7f11449d1993 on
repofiles-client-0. Skipping conservative merge on the file.<br>
[2015-07-15 19:48:46.033222] W
[afr-common.c:1985:afr_discover_done]
0-repofiles-replicate-0: no read subvols for /<br>
[2015-07-15 19:48:46.033224] I
[afr-common.c:1673:afr_local_discovery_cbk]
0-repofiles-replicate-0: selecting local read_child
repofiles-client-1<br>
[2015-07-15 19:48:46.033569] W
[afr-common.c:1985:afr_discover_done]
0-repofiles-replicate-0: no read subvols for /<br>
[2015-07-15 19:48:46.033624] I [MSGID: 104041]
[glfs-resolve.c:843:__glfs_active_subvol] 0-repofiles:
switched to graph 64697374-2d67-6c32-2d32-313133392d32 (0)<br>
[2015-07-15 19:48:46.033906] W
[afr-common.c:1985:afr_discover_done]
0-repofiles-replicate-0: no read subvols for /<br>
[2015-07-15 19:48:48.036482] W [MSGID: 108008]
[afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check]
0-repofiles-replicate-0: GFID mismatch for
<gfid:00000000-0000-0000-0000-000000000001>/test
e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1
and 16da3178-8a6e-4010-b874-7f11449d1993 on
repofiles-client-0<br>
<br>
Where I did mistake when try solve split-brain?<br>
<br>
</div>
Best regards,<br>
</div>
Igor<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2015-07-14 22:11 GMT+03:00 Roman <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:romeo.r@gmail.com" target="_blank">romeo.r@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">never mind. I do not have enough time to
debug why basic commands of gluster do not work on
production server. It was enough of tonight's system
freeze due to not documented XFS settings MUST have to run
glusterfs with XFS. I'll keep to EXT4 better. Anyway XFS
for bricks did not solve my previous problem.
<div><br>
</div>
<div>To solve split-brain this time, I've restored VM from
backup.</div>
</div>
<div class="gmail_extra">
<div>
<div class="h5"><br>
<div class="gmail_quote">2015-07-14 21:55 GMT+03:00
Roman <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:romeo.r@gmail.com" target="_blank">romeo.r@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><span>
<div dir="ltr"
style="font-size:12.8000001907349px">Thanx
for pointing out...
<div>but doesn't seem to work... or i am too
sleepy due to problems with glusterfs and
debian8 in other topic which i'm fighting
for month..</div>
<div><br>
</div>
<div>
<div>root@stor1:~# gluster volume heal
HA-2TB-TT-Proxmox-cluster split-brain
source-brick
stor1:HA-2TB-TT-Proxmox-cluster/2TB
/images/124/vm-124-disk-1.qcow2</div>
<div>Usage: volume heal <VOLNAME>
[{full | statistics {heal-count {replica
<hostname:brickname>}} |info
{healed | heal-failed | split-brain}}]</div>
</div>
<div><br>
</div>
<div>seems like wrong command...</div>
</div>
</span>
<div style="font-size:12.8000001907349px">
<div><img moz-do-not-send="true"
src="https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif"></div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote"><span>2015-07-14 21:23
GMT+03:00 Joe Julian <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:joe@julianfamily.org"
target="_blank">joe@julianfamily.org</a>></span>:<br>
</span>
<div>
<div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"><span>On
07/14/2015 11:19 AM, Roman wrote:<br>
</span>
<div>
<div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
Hi,<br>
<br>
played with glusterfs tonight and
tried to use recommended XFS for
gluster.. first try was pretty bad
and all of my VM-s hanged (XFS
wants allocsize=64k to create
qcow2 files, which i didn't know
about and tried to create VM on
XFS without this config line in
fstab, which lead to a lot of IO-s
and qemu says it got time out
while creating the file)..<br>
<br>
now i've got this:<br>
Brick
stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/<br>
/images/124/vm-124-disk-1.qcow2 -
Is in split-brain<br>
<br>
Number of entries: 1<br>
<br>
Brick
stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/<br>
/images/124/vm-124-disk-1.qcow2 -
Is in split-brain<br>
<br>
ok. what next?<br>
I've deleted one of files, it
didn't help. even more, selfheal
restored the file on node, where i
deleted it... and still
split-brain.<br>
<br>
how to fix?<br>
<br>
-- <br>
Best regards,<br>
Roman.<br>
<br>
</blockquote>
<br>
<br>
</div>
</div>
<span><a moz-do-not-send="true"
href="https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md"
rel="noreferrer" target="_blank">https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md</a>
<br>
<br>
or<br>
<br>
<a moz-do-not-send="true"
href="https://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/"
rel="noreferrer" target="_blank">https://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/</a><br>
</span>
_______________________________________________<br>
Gluster-users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org"
target="_blank">Gluster-users@gluster.org</a><br>
<a moz-do-not-send="true"
href="http://www.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote>
</div>
</div>
</div>
<span><font color="#888888"><br>
<br clear="all">
<div><br>
</div>
-- <br>
<div>Best regards,<br>
Roman.</div>
</font></span></div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
</div>
</div>
<span class="HOEnZb"><font color="#888888">-- <br>
<div>Best regards,<br>
Roman.</div>
</font></span></div>
<br>
_______________________________________________<br>
Gluster-users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a moz-do-not-send="true"
href="http://www.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<br>
</body>
</html>