<div dir="ltr">Since I stopped writing to the clients (so I could cleanly work on the split brain) I got no more entries on /var/log/gluster.log (this is the client log, right?)<div><br></div><div><br></div><div>While working with diff command in order to fix the split brain, I saw several entries like these:<div><br></div><div><div>diff: r2/webhost/sites/clipart/assets/apache/images/13/templates/558482: Transport endpoint is not connected</div><div>diff: r2/webhost/sites/clipart/assets/apache/images/13/templates/558483: Transport endpoint is not connected</div><div>diff: r2/webhost/sites/clipart/assets/apache/images/13/templates/558484: Transport endpoint is not connected</div></div><div><br></div><div>They happen a lot, then stops. Then happen again and so on.</div><div><br></div><div>At the same time the errors are showing, ping from the system I'm working on split-brain to the system that is failing to connect (r2) shows this:</div><div><br></div><div><div>64 bytes from r2-server (r2-ip): icmp_seq=662 ttl=64 time=1.21 ms</div><div>64 bytes from r2-server (r2-ip): icmp_seq=663 ttl=64 time=0.990 ms</div><div>64 bytes from r2-server (r2-ip): icmp_seq=664 ttl=64 time=1.01 ms</div></div><div><br></div><div>I know this is a very trivial network checking that may not be showing me what I want to see, and I'm working on more elaborated one. But I'm completely open for suggestions on how to properly do that in order to verify if this is issue when talking about gluster.</div><div><br></div><div><br></div><div>So far, thank you so much, guys!</div><div><br></div><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 26, 2015 at 8:36 PM, Joe Julian <span dir="ltr"><<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Check your client logs. Perhaps the client isn't actually connecting
to both servers. <br><div><div class="h5">
<br>
<div>On 01/26/2015 02:12 PM, Tiago Santos
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">That's what I meant. Sorry for the confusion.<br>
<br>
I'm writing on Client1 (same server as Brick1). Client2 (mounted
Brick2, on server2) has nothing writing to it (so far).
<div><br>
</div>
<div>My wondering is how I went up on having a split-brain if
I'm only writing on one client.<br>
<div><br>
</div>
<div><br>
</div>
<div><br>
<br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Jan 26, 2015 at 8:04 PM,
Joe Julian <span dir="ltr"><<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Nothing but
GlusterFS should be writing to bricks. Mount a
client and write there.
<div>
<div><br>
<br>
<div>On 01/26/2015 01:38 PM, Tiago Santos wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Right.
<div><br>
</div>
<div>I have Brick1 being constantly written.
But I have nothing writing on Brick2. It
just get "healed" data from Brick1.</div>
<div><br>
</div>
<div>This setup is still not in production,
and there's no applications using that
data. I have rsyncs constantly updating
Brick1 (bring data from production
servers), and then Gluster updates Brick2.</div>
<div><br>
</div>
<div>Which makes me wonder how may I be
creating multiple replicas during a
split-brain.</div>
<div><br>
</div>
<div><br>
</div>
<div>It may be the case that, having a
split-brain event, I may be updating
versions of the same file on Brick1
(only), and Gluster understands it as
different versions and things get confuse?</div>
<div><br>
</div>
<div><br>
</div>
<div>Anyways, while we talk I'm gonna run
Joe's precious procedure on split-brain
recovery.</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Jan 26,
2015 at 7:23 PM, Joe Julian <span dir="ltr"><<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Mismatched
GFIDs would happen if a file is created
on multiple replicas during a
split-brain event. The GFID is assigned
at file creation.
<div>
<div><br>
<br>
On 01/26/2015 01:04 PM, A Ghoshal
wrote:<br>
</div>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div> Yep, so it is indeed a
split-brain caused by a mismatch
of the trusted.gfid attribute.<br>
<br>
Sadly, I don't know precisely what
causes it. -Communication loss
might be one of the triggers. I am
guessing the files with the
problem are dynamic, correct? In
our setup (also replica 2),
communication is never a problem
but we do see this when one of the
server takes a reboot. Maybe some
obscure and difficult to
understand race between background
self-heal and the self heal
daemon...<br>
<br>
In any case, a normal procedure
for split brain recovery would
work for you if you wish to get
you files back in function. It's
easy to find on google. I use the
instructions on Joe Julian's blog
page myself.<br>
<br>
<br>
-----Tiago Santos <<a href="mailto:tiago@musthavemenus.com" target="_blank">tiago@musthavemenus.com</a>>
wrote: -----<br>
<br>
=======================<br>
To: A Ghoshal <<a href="mailto:a.ghoshal@tcs.com" target="_blank">a.ghoshal@tcs.com</a>><br>
From: Tiago Santos <<a href="mailto:tiago@musthavemenus.com" target="_blank">tiago@musthavemenus.com</a>><br>
Date: 01/27/2015 02:11AM<br>
Cc: gluster-users <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
Subject: Re: [Gluster-users]
Pretty much any operation related
to Gluster mounted fs hangs for a
while<br>
=======================<br>
Oh, right!<br>
<br>
Follow the outputs:<br>
<br>
<br>
root@web3:/export/images1-1/brick#
time getfattr -m . -d -e hex<br>
templates/assets/prod/temporary/13/user_1339200.png<br>
# file:
templates/assets/prod/temporary/13/user_1339200.png<br>
trusted.afr.site-images-client-0=0x000000000000000400000000<br>
trusted.afr.site-images-client-1=0x000000020000000900000000<br>
trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527<br>
<br>
real 0m0.024s<br>
user 0m0.001s<br>
sys 0m0.001s<br>
<br>
<br>
<br>
root@web4:/export/images2-1/brick#
time getfattr -m . -d -e hex<br>
templates/assets/prod/temporary/13/user_1339200.png<br>
# file:
templates/assets/prod/temporary/13/user_1339200.png<br>
trusted.afr.site-images-client-0=0x000000000000000000000000<br>
trusted.afr.site-images-client-1=0x000000000000000000000000<br>
trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3<br>
<br>
real 0m0.003s<br>
user 0m0.000s<br>
sys 0m0.006s<br>
<br>
<br>
Not sure exactly what that means.
I'm googling, and would appreciate
if you<br>
guys can bring some light.<br>
<br>
Thanks!<br>
--<br>
Tiago<br>
<br>
<br>
<br>
<br>
On Mon, Jan 26, 2015 at 6:16 PM, A
Ghoshal <<a href="mailto:a.ghoshal@tcs.com" target="_blank">a.ghoshal@tcs.com</a>>
wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Actually you ran getfattr on the
volume - which is why the
requisite<br>
extended attributes never showed
up...<br>
<br>
Your bricks are mounted
elsewhere.<br>
/exports/images1-1/brick, and
exports/images2-1/brick<br>
<br>
Btw, what version of Linux do
you use? And, are the files you
observe the<br>
input/output errors on
soft-links?<br>
<br>
-----Tiago Santos <<a href="mailto:tiago@musthavemenus.com" target="_blank">tiago@musthavemenus.com</a>>
wrote: -----<br>
<br>
=======================<br>
To: A Ghoshal <<a href="mailto:a.ghoshal@tcs.com" target="_blank">a.ghoshal@tcs.com</a>><br>
From: Tiago Santos <<a href="mailto:tiago@musthavemenus.com" target="_blank">tiago@musthavemenus.com</a>><br>
Date: 01/27/2015 12:20AM<br>
Cc: gluster-users <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
Subject: Re: [Gluster-users]
Pretty much any operation
related to Gluster<br>
mounted fs hangs for a while<br>
=======================<br>
Thanks for you input,
Anirban.<br>
<br>
I ran the commands on both
servers, with the following
results:<br>
<br>
<br>
root@web3:/var/www/site-images#
time getfattr -m . -d -e hex<br>
templates/assets/prod/temporary/13/user_1339200.png<br>
<br>
real 0m34.524s<br>
user 0m0.004s<br>
sys 0m0.000s<br>
<br>
<br>
root@web4:/var/www/site-images#
time getfattr -m . -d -e hex<br>
templates/assets/prod/temporary/13/user_1339200.png<br>
getfattr:
templates/assets/prod/temporary/13/user_1339200.png:
Input/output<br>
error<br>
<br>
real 0m11.315s<br>
user 0m0.001s<br>
sys 0m0.003s<br>
root@web4:/var/www/site-images#
ls<br>
templates/assets/prod/temporary/13/user_1339200.png<br>
ls: cannot access
templates/assets/prod/temporary/13/user_1339200.png:<br>
Input/output error<br>
<br>
<br>
</blockquote>
=====-----=====-----=====<br>
Notice: The information contained
in this e-mail<br>
message and/or attachments to it
may contain<br>
confidential or privileged
information. If you are<br>
not the intended recipient, any
dissemination, use,<br>
review, distribution, printing or
copying of the<br>
information contained in this
e-mail message<br>
and/or attachments to it are
strictly prohibited. If<br>
you have received this
communication in error,<br>
please notify us by reply e-mail
or telephone and<br>
immediately and permanently delete
the message<br>
and any attachments. Thank you<br>
<br>
<br>
<br>
</div>
</div>
<span>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</span></blockquote>
<div>
<div> <br>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div>
<div dir="ltr">
<div>
<div dir="ltr"><font color="#444444"><b>Tiago
Santos</b></font>
<div>
<div><font color="#ff0000">MustHaveMenus.com</font></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div>
<div dir="ltr">
<div>
<div dir="ltr"><font color="#444444"><b>Tiago Santos</b></font>
<div>
<div><font color="#ff0000">MustHaveMenus.com</font></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><font color="#444444"><b>Tiago Santos</b></font><div><div><font color="#ff0000">MustHaveMenus.com</font></div></div></div></div></div></div>
</div>