<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 5, 2016 at 8:37 PM, Serkan Çoban <span dir="ltr"><<a href="mailto:cobanserkan@gmail.com" target="_blank">cobanserkan@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi again,<br>
<br>
I am seeing the above situation in production environment now.<br>
One disk on one of my servers broken. I killed the brick process,<br>
replace the disk, mount it and then I do a gluster v start force.<br>
<br>
For a 24 hours period after replacing disks I see below gluster v<br>
heal info count increased until 200.000<br>
<br>
gluster v heal v0 info | grep "Number of entries" | grep -v "Number of<br>
entries: 0"<br>
Number of entries: 205117<br>
Number of entries: 205231<br>
...<br>
...<br>
...<br>
<br>
For about 72 hours It decreased to 40K, and it is going very slowly right now.<br>
What I am observing is very very slow heal speed. There is no errors<br>
in brick logs.<br>
There was 900GB data in broken disk and now I see 200GB healed after<br>
96 hours after replacing disk.<br>
There are below warnings in glustershd.log but I think they are harmless.<br>
<br>
W [ec_combine.c:866:ec_combine_<wbr>check] 0-v0-disperse-56: Mismatching<br>
xdata in answers of LOOKUP<br>
W [ec_common.c:116:ec_check_<wbr>status] 0-v0-disperse-56: Operation failed<br>
on some subvolumes (up=FFFFF, mask=FFFFF, remaining=0, good=FFFF7,<br>
bad=8)<br>
W [ec_common.c:71:ec_heal_<wbr>report] 0-v0-disperse-56: Heal failed<br>
[invalid argument]<br>
<br>
I tried turning on performance.client-io-threads but it did not<br>
changed anything.<br>
For 900GB data It will take nearly 8 days to heal. What can I do?<br></blockquote><div><br></div><div>Sorry for the delay in response, do you still have this problem?<br></div><div>You can trigger heals using the following command:<br><pre class="" id="comment_text_0">find <dir-you-are-interested> -d -exec getfattr -h -n trusted.ec.heal {} \;</pre>If you have 10 top level directories may be you can spawn 10 such processes.<br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class=""><font color="#888888"><br>
Serkan<br>
</font></span><div class=""><div class="h5"><br>
<br>
<br>
On Fri, Apr 15, 2016 at 1:28 PM, Serkan Çoban <<a href="mailto:cobanserkan@gmail.com">cobanserkan@gmail.com</a>> wrote:<br>
> 100TB is newly created files when brick is down.I rethink the<br>
> situation and realized that I reformatted all the bricks in case 1 so<br>
> write speed limit is 26*100MB/disk<br>
> In case 2 I just reformatted one brick so write speed limited to<br>
> 100MB/disk...I will repeat the tests using one brick in both cases<br>
> once with reformat, and once with just killing brick process...<br>
> Thanks for reply..<br>
><br>
> On Fri, Apr 15, 2016 at 9:27 AM, Xavier Hernandez <<a href="mailto:xhernandez@datalab.es">xhernandez@datalab.es</a>> wrote:<br>
>> Hi Serkan,<br>
>><br>
>> sorry for the delay, I'm a bit busy lately.<br>
>><br>
>> On 13/04/16 13:59, Serkan Çoban wrote:<br>
>>><br>
>>> Hi Xavier,<br>
>>><br>
>>> Can you help me about the below issue? How can I increase the disperse<br>
>>> heal speed?<br>
>><br>
>><br>
>> It seems weird. Is there any related message in the logs ?<br>
>><br>
>> In this particular test, are the 100TB modified files or newly created files<br>
>> while the brick was down ?<br>
>><br>
>> How many files have been modified ?<br>
>><br>
>>> Also I would be grateful if you have detailed documentation about disperse<br>
>>> heal,<br>
>>> why heal happens on disperse volume, how it is triggered? Which nodes<br>
>>> participate in heal process? Any client interaction?<br>
>><br>
>><br>
>> Heal process is basically the same used for replicate. There are two ways to<br>
>> trigger a self-heal:<br>
>><br>
>> * when an inconsistency is detected, the client initiates a background<br>
>> self-heal of the inode<br>
>><br>
>> * the self-heal daemon scans the lists of modified files created by the<br>
>> index xlator when a modification is made while some node is down. All these<br>
>> files are self-healed.<br>
>><br>
>> Xavi<br>
>><br>
>><br>
>>><br>
>>> Serkan<br>
>>><br>
>>><br>
>>> ---------- Forwarded message ----------<br>
>>> From: Serkan Çoban <<a href="mailto:cobanserkan@gmail.com">cobanserkan@gmail.com</a>><br>
>>> Date: Fri, Apr 8, 2016 at 5:46 PM<br>
>>> Subject: disperse heal speed up<br>
>>> To: Gluster Users <<a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>><br>
>>><br>
>>><br>
>>> Hi,<br>
>>><br>
>>> I am testing heal speed of disperse volume and what I see is 5-10MB/s per<br>
>>> node.<br>
>>> I increased disperse.background-heals to 32 and<br>
>>> disperse.heal-wait-qlength to 256, but still no difference.<br>
>>> One thing I noticed is that, when I kill a brick process, reformat it<br>
>>> and restart it heal speed is nearly 20x (200MB/s/node)<br>
>>><br>
>>> But when I kill the brick, then write 100TB data, and start brick<br>
>>> afterwords heal is slow (5-10MB/s/node)<br>
>>><br>
>>> What is the difference between two scenarios? Why one heal is slow and<br>
>>> other is fast? How can I increase disperse heal speed? Should I<br>
>>> increase thread count to 128 or 256? I am on 78x(16+4) disperse volume<br>
>>> and my servers are pretty strong (2x14 cores with 512GB ram, each node<br>
>>> has 26x8TB disks)<br>
>>><br>
>>> Gluster version is 3.7.10.<br>
>>><br>
>>> Thanks,<br>
>>> Serkan<br>
>>><br>
>><br>
______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/<wbr>mailman/listinfo/gluster-users</a></div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Pranith<br></div></div>
</div></div>