<div dir="ltr"><br><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr"><span><font color="#888888"><span style="color:rgb(0,0,0)"><b><i>David Gossage</i></b></span><font><i><span style="color:rgb(51,51,51)"><b><br>
</b></span></i></font></font></span><div><span><font color="#888888"><font><i><span style="color:rgb(51,51,51)"></span></i><font size="1"><b style="color:rgb(153,0,0)">Carousel Checks Inc.<span style="color:rgb(204,204,204)"> | System Administrator</span></b></font></font><font style="color:rgb(153,153,153)"><font size="1"><br>
</font></font><font><font size="1"><span style="color:rgb(51,51,51)"><b style="color:rgb(153,153,153)">Office</b><span style="color:rgb(153,153,153)"> <a value="+17086132426">708.613.2284<font color="#888888"><font size="1"><br></font></font></a></span></span></font></font></font></span></div></div></div></div>
<br><div class="gmail_quote">On Thu, May 19, 2016 at 7:25 PM, Kevin Lemonnier <span dir="ltr"><<a href="mailto:lemonnierk@ulrar.net" target="_blank">lemonnierk@ulrar.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The I/O errors are happening after, not during the heal.<br>
As described, I just rebooted a node, waited for the heal to finish,<br>
rebooted another, waited for the heal to finish then rebooted the third.<br>
>From that point, the VM just has a lot of I/O errors showing whenever I<br>
use the disk a lot (importing big MySQL dumps). The VM "screen" on the console<br>
tab of proxmox just spams I/O errors from that point, which it didn't before rebooting<br>
the gluster nodes. Tried to poweroff the VM and force full heals, but I didn't find<br>
a way to fix the problem short of deleting the VM disk and restoring it from a backup.<br>
<br>
I have 3 other servers on 3.7.6 where that problem isn't happening, so it might be a 3.7.11 bug,<br>
but since the raid card failed recently on one of the nodes I'm not really sure some other<br>
piece of hardware isn't at fault .. Unfortunatly I don't have the hardware to test that.<br>
The only way to be sure would be to upgrade the 3.7.6 nodes to 3.7.11 and repeat the same tests,<br>
but those nodes are in production and the VM freezes during the heal last month already<br>
caused huge problems for our clients, really can't afford any other problems there,<br>
so testing on them isn't an option.<br>
<br></blockquote><div><br></div><div>Are the 3.7.11 nodes in production? Could they be downgraded to 3.7.6 and see if problem still occurs?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
To sum up, I have 3 nodes on 3.7.6 with no corruption happening but huge freezes during heals,<br>
and 3 other nodes on 3.7.11 with no freezes during heal but corruption. qemu-img doesn't see the<br>
corruption, it only shows on the VM's screen and seems mostly harmless, but sometimes the VM<br>
does switch to read-only mode saying it had too many I/O errors.<br>
<br>
Would the bitrot detection deamon detect a hardware problem ? I did enable it but it didn't<br>
detect anything, although I don't know how to force a check on it, no idea if it ran a scrub<br>
since the corruption happened.<br>
<br>
<br>
On Thu, May 19, 2016 at 04:04:49PM -0400, Alastair Neil wrote:<br>
> I am slightly confused you say you have image file corruption but then you<br>
> say the qemu-img check says there is no corruption.A If what you mean is<br>
> that you see I/O errors during a heal this is likely to be due to io<br>
> starvation, something that is a well know issue.<br>
> There is work happening to improve this in version 3.8:<br>
> <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1269461" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1269461</a><br>
> On 19 May 2016 at 09:58, Kevin Lemonnier <<a href="mailto:lemonnierk@ulrar.net">lemonnierk@ulrar.net</a>> wrote:<br>
><br>
> That's a different problem then, I have corruption without removing or<br>
> adding bricks,<br>
> as mentionned. Might be two separate issue<br>
><br>
> On Thu, May 19, 2016 at 11:25:34PM +1000, Lindsay Mathieson wrote:<br>
> >A A On 19/05/2016 12:17 AM, Lindsay Mathieson wrote:<br>
> ><br>
> >A A A One thought - since the VM's are active while the brick is<br>
> >A A A removed/re-added, could it be the shards that are written<br>
> while the<br>
> >A A A brick is added that are the reverse healing shards?<br>
> ><br>
> >A A I tested by:<br>
> ><br>
> >A A - removing brick 3<br>
> ><br>
> >A A - erasing brick 3<br>
> ><br>
> >A A - closing down all VM's<br>
> ><br>
> >A A - adding new brick 3<br>
> ><br>
> >A A - waiting until heal number reached its max and started<br>
> decreasing<br>
> ><br>
> >A A A There were no reverse heals<br>
> ><br>
> >A A - Started the VM's backup. No real issues there though one showed<br>
> IO<br>
> >A A errors, presumably due to shards being locked as they were<br>
> healed.<br>
> ><br>
> >A A - VM's started ok, no reverse heals were noted and eventually<br>
> Brick 3 was<br>
> >A A fully healed. The VM's do not appear to be corrupted.<br>
> ><br>
> >A A So it would appear the problem is adding a brick while the volume<br>
> is being<br>
> >A A written to.<br>
> ><br>
> >A A Cheers,<br>
> ><br>
> >A --<br>
> >A Lindsay Mathieson<br>
><br>
> > _______________________________________________<br>
> > Gluster-users mailing list<br>
> > <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
> > <a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
><br>
> --<br>
> Kevin Lemonnier<br>
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111<br>
> _______________________________________________<br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
> <a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Kevin Lemonnier<br>
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111<br>
</font></span><br>_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br></blockquote></div><br></div></div>