[Gluster-users] Unexpected behaviour during replication heal

Tue Jun 28 14:30:10 UTC 2011

> Can you check the server (brick) logs to check the order of detected
> disconnection and new/reconnection from the client?

Hi,
  It seems this wasn't due to keepalives - the system time on both server was a few seconds out.  After a pointer from someone off-list, I synced the time and ran ntpd (which I wasn't doing as this was just a test system) and did some more tests.

The partial-file syndrome I noted before seems to have gone away - at least in terms of the file not syncing back to the previously disconnected server after it finds it way back into the cluster.  Once the keepalive timeout is reached, the client sends all the data to the second server.

A quick question on that actually - when all servers are online, are the clients supposed to send the data to both at the same time?  I see from monitoring the traffic that the client duplicates the writes - one to each server?

Also, when one of the servers disconnects, is it notmal that the client "stalls" the write until the keepalive time expires and the online servers notice one has vanished?

Finally, during my testing I encountered a replicable hard lock up of the client... here's the situation:
  Server1 and Server2 in the cluster, sharing 'data-volume' (which is /data on both servers).
  Client mounts server1:data-volume as /mnt.
  Client begins to write a large (1 or 2 GB) file to /mnt  (I just used random data).
  Server1 goes down part way through the write (I simulated this by iptables -j DROP'ing everything from relevant IPs).
  Client "stalls" writes until the keepalive timeout, and then continues to send data to Server2.
  Server1 comes back online shortly after the keepalive timeout - but BEFORE the Client has written all the data toServer2.
  Server1 and Server2 reconnect and the writes on the Client completely hang.

The mounted directory on the client becomes completely in-accessible when the two servers reconnect.
I had to kill -9 the dd process doing the write (along with the glusterfs process on the client) in order to release the mountpoint.

I've reproduced this issue several times now and the result is always the same.  If the client is writing data to a server when one of the others comes back online after an outage, the client will hang.

I've attached logs for one of the times I tested this - I hope it helps in diagnosing the problem :)

Let me know if you need any more info.

-- 
Darren Austin - Systems Administrator, Widgit Software.
Tel: +44 (0)1926 333680.    Web: http://www.widgit.com/
26 Queen Street, Cubbington, Warwickshire, CV32 7NA.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mnt.log
Type: text/x-log
Size: 13142 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110628/a3003e39/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: server1.log
Type: text/x-log
Size: 1770 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110628/a3003e39/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: server2.log
Type: text/x-log
Size: 589 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110628/a3003e39/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: server1-brick.log
Type: text/x-log
Size: 7551 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110628/a3003e39/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: server2-brick.log
Type: text/x-log
Size: 2082 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110628/a3003e39/attachment-0004.bin>