<div dir="ltr">We do need to consider this as bug and fix full self-heal to handle the case where it has to look at both the bricks to see if there are any files missing in the bricks. We won't be letting this happen on the mounts though because it will slow down performance. Be very careful about deleting files directly from the brick though. It is always recommended you take back up of the good file before attempting heal.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 17, 2016 at 4:28 PM, Дмитрий Глушенок <span dir="ltr"><<a href="mailto:glush@jet.msk.su" target="_blank">glush@jet.msk.su</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>You are right, stat triggers self-heal. Thank you!</div><span class=""><br><div>
<div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word"><div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word"><div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word"><div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word"><div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word"><div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word"><div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word"><div>--</div><div><div style="word-wrap:break-word"><div style="word-wrap:break-word"><div style="word-wrap:break-word"><div>Dmitry Glushenok</div><div>Jet Infosystems</div></div></div></div></div></div></div></div></div></div></div></div>
</div>
<br></span><div><blockquote type="cite"><div>17 авг. 2016 г., в 13:38, Ravishankar N <<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>> написал(а):</div><div><div class="h5"><br><div>
<div bgcolor="#FFFFFF" text="#000000">
<div>On 08/17/2016 03:48 PM, Дмитрий
Глушенок wrote:<br>
</div>
<blockquote type="cite">
<div>Unfortunately not:</div>
<div><br>
</div>
<div>Remount FS, then access test file from second
client:</div>
<div><br>
</div>
<div>
<div>[root@srv02 ~]# umount /mnt</div>
<div>[root@srv02 ~]# mount -t glusterfs srv01:/test01
/mnt</div>
<div>[root@srv02 ~]# ls -l /mnt/passwd </div>
<div>-rw-r--r--. 1 root root 1505 авг 16 19:59
/mnt/passwd</div>
<div>[root@srv02 ~]# ls -l /R1/test01/</div>
<div>итого 4</div>
<div>-rw-r--r--. 2 root root 1505 авг 16 19:59 passwd</div>
<div>[root@srv02 ~]# </div>
<div><br>
</div>
<div>Then remount FS and check if accessing the file
from second node triggered self-heal on first node:</div>
<div><br>
</div>
<div>[root@srv01 ~]# umount /mnt</div>
<div>[root@srv01 ~]# mount -t glusterfs srv01:/test01
/mnt</div>
<div>[root@srv01 ~]# ls -l /mnt</div>
</div>
</blockquote>
<br>
Can you try `stat /mnt/passwd` from this node after remounting? You
need to explicitly lookup the file. `ls -l /mnt` is only
triggering readdir on the parent directory.<br>
If that doesn't work, is this mount connected to both clients? i.e.
if you create a new file from here, is it getting replicated to both
bricks?<br>
<br>
-Ravi<br>
<br>
<blockquote type="cite">
<div>
<div>итого 0</div>
<div>[root@srv01 ~]# ls -l /R1/test01/</div>
<div>итого 0</div>
<div>[root@srv01 ~]#</div>
</div>
<div><br>
</div>
<div>Nothing appeared.</div>
<div><br>
</div>
<div>
<div>[root@srv01 ~]# gluster volume info test01</div>
<div> </div>
<div>Volume Name: test01</div>
<div>Type: Replicate</div>
<div>Volume ID: 2c227085-0b06-4804-805c-<wbr>ea9c1bb11d8b</div>
<div>Status: Started</div>
<div>Number of Bricks: 1 x 2 = 2</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1: srv01:/R1/test01</div>
<div>Brick2: srv02:/R1/test01</div>
<div>Options Reconfigured:</div>
<div>features.scrub-freq: hourly</div>
<div>features.scrub: Active</div>
<div>features.bitrot: on</div>
<div>transport.address-family: inet</div>
<div>performance.readdir-ahead: on</div>
<div>nfs.disable: on</div>
<div>[root@srv01 ~]# </div>
</div>
<div><br>
</div>
<div>
<div>[root@srv01 ~]# gluster volume get test01 all |
grep heal</div>
<div>cluster.background-self-heal-<wbr>count 8
</div>
<div>cluster.metadata-self-heal on
</div>
<div>cluster.data-self-heal on
</div>
<div>cluster.entry-self-heal on
</div>
<div>cluster.self-heal-daemon on
</div>
<div>cluster.heal-timeout 600
</div>
<div>cluster.self-heal-window-size 1
</div>
<div>cluster.data-self-heal-<wbr>algorithm (null)
</div>
<div>cluster.self-heal-readdir-size 1KB
</div>
<div>cluster.heal-wait-queue-length 128
</div>
<div>features.lock-heal off
</div>
<div>features.lock-heal off
</div>
<div>storage.health-check-interval 30
</div>
<div>features.ctr_lookupheal_link_<wbr>timeout 300
</div>
<div>features.ctr_lookupheal_inode_<wbr>timeout 300
</div>
<div>cluster.disperse-self-heal-<wbr>daemon enable
</div>
<div>disperse.background-heals 8
</div>
<div>disperse.heal-wait-qlength 128
</div>
<div>cluster.heal-timeout 600
</div>
<div>cluster.granular-entry-heal no
</div>
<div>[root@srv01 ~]#</div>
</div>
<br>
<div>
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div>--</div>
<div>
<div style="word-wrap:break-word">
<div style="word-wrap:break-word">
<div style="word-wrap:break-word">
<div>Dmitry Glushenok</div>
<div>Jet Infosystems</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<div>
<blockquote type="cite">
<div>17 авг. 2016 г., в 11:30, Ravishankar N <<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>>
написал(а):</div>
<br>
<div>
<div bgcolor="#FFFFFF" text="#000000">
<div>On 08/17/2016 01:48 PM,
Дмитрий Глушенок wrote:<br>
</div>
<blockquote type="cite">
<div>Hello Ravi,</div>
<div><br>
</div>
<div>Thank you for reply. Found bug number (for
those who will google the email) <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1112158" target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1112158</a></div>
<div><br>
</div>
<div>Accessing the removed file from
mount-point is not always working because we have to
find a special client which DHT will point to the
brick with removed file. Otherwise the file will be
accessed from good brick and self-healing will not
happen (just verified). Or by accessing you meant
something like touch?</div>
</blockquote>
<br>
Sorry should have been more explicit. I meant triggering a
lookup on that file with `stat filename`. I don't think
you need a special client. DHT sends the lookup to AFR
which in turn sends to all its children. When one of them
returns ENOENT (because you removed it from the brick),
AFR will automatically trigger heal. I'm guessing it is
not always working in your case due to caching at various
levels and the lookup not coming till AFR. If you do it
from a fresh mount ,it should always work.<br>
-Ravi<br>
<br>
<blockquote type="cite">
<div>
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div>
<div style="word-wrap:break-word">
<div style="word-wrap:break-word">
<div style="word-wrap:break-word">
<div>Dmitry Glushenok</div>
<div>Jet Infosystems</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<div>
<blockquote type="cite">
<div>17 авг. 2016 г., в 4:24, Ravishankar N
<<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>>
написал(а):</div>
<br>
<div><span style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">On
08/16/2016 10:44 PM, Дмитрий Глушенок wrote:</span><br style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<blockquote type="cite" style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">Hello,<br>
<br>
While testing healing after bitrot error it was
found that self healing cannot heal files which
were manually deleted from brick. Gluster 3.8.1:<br>
<br>
- Create volume, mount it locally and copy test
file to it<br>
[root@srv01 ~]# gluster volume create test01
replica 2 srv01:/R1/test01 srv02:/R1/test01<br>
volume create: test01: success: please start the
volume to access data<br>
[root@srv01 ~]# gluster volume start test01<br>
volume start: test01: success<br>
[root@srv01 ~]# mount -t glusterfs srv01:/test01
/mnt<br>
[root@srv01 ~]# cp /etc/passwd /mnt<br>
[root@srv01 ~]# ls -l /mnt<br>
итого 2<br>
-rw-r--r--. 1 root root 1505 авг 16 19:59 passwd<br>
<br>
- Then remove test file from first brick like we
have to do in case of bitrot error in the file<br>
</blockquote>
<br style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">You also
need to remove all hard-links to the corrupted
file from the brick, including the one in the
.glusterfs folder.</span><br style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">There is a
bug in heal-full that prevents it from crawling
all bricks of the replica. The right way to heal
the corrupted files as of now is to access them
from the mount-point like you did after removing
the hard-links. The list of files that are
corrupted can be obtained with the scrub status
command.</span><br style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">Hope this
helps,</span><br style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<span style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">Ravi</span><br style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<blockquote type="cite" style="font-family:Menlo-Regular;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">[root@srv01
~]# rm /R1/test01/passwd<br>
[root@srv01 ~]# ls -l /mnt<br>
итого 0<br>
[root@srv01 ~]#<br>
<br>
- Issue full self heal<br>
[root@srv01 ~]# gluster volume heal test01 full<br>
Launching heal operation to perform full self
heal on volume test01 has been successful<br>
Use heal info commands to check status<br>
[root@srv01 ~]# tail -2
/var/log/glusterfs/glustershd.<wbr>log<br>
[2016-08-16 16:59:56.483767] I [MSGID: 108026]
[afr-self-heald.c:611:afr_shd_<wbr>full_healer]
0-test01-replicate-0: starting full sweep on
subvol test01-client-0<br>
[2016-08-16 16:59:56.486560] I [MSGID: 108026]
[afr-self-heald.c:621:afr_shd_<wbr>full_healer]
0-test01-replicate-0: finished full sweep on
subvol test01-client-0<br>
<br>
- Now we still see no files in mount point (it
becomes empty right after removing file from the
brick)<br>
[root@srv01 ~]# ls -l /mnt<br>
итого 0<br>
[root@srv01 ~]#<br>
<br>
- Then try to access file by using full name
(lookup-optimize and readdir-optimize are turned
off by default). Now glusterfs shows the file!<br>
[root@srv01 ~]# ls -l /mnt/passwd<br>
-rw-r--r--. 1 root root 1505 авг 16 19:59
/mnt/passwd<br>
<br>
- And it reappeared in the brick<br>
[root@srv01 ~]# ls -l /R1/test01/<br>
итого 4<br>
-rw-r--r--. 2 root root 1505 авг 16 19:59 passwd<br>
[root@srv01 ~]#<br>
<br>
Is it a bug or we can tell self heal to scan all
files on all bricks in the volume?<br>
<br>
--<br>
Dmitry Glushenok<br>
Jet Infosystems<br>
<br>
______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/<wbr>mailman/listinfo/gluster-users</a></blockquote>
</div>
</blockquote>
</div>
<br>
</blockquote><p><br>
</p>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote><p><br>
</p>
</div>
</div></div></div></blockquote></div><br></div><br>______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Pranith<br></div></div>
</div>