[Gluster-users] Gluster native client in case of distributed volume node failure

Thu May 12 09:47:51 UTC 2011

Hello,

anyone has any comments to the issues I described below? Any feedback 
would be more than welcome.

Thanks

On 16.4.2011. 21:01, Emir Imamagic wrote:
> Hello,
>
> I am trying to find precise definition of gluster native client behavior
> in case of distributed volume node failure. Some info is provided in FAQ:
>
> http://www.gluster.com/community/documentation/index.php/GlusterFS_Technical_FAQ#What_happens_if_a_GlusterFS_brick_crashes.3F
>
> but it doesn't provide details.
> The other info I managed to find is this stale document:
>
> http://www.gluster.com/community/documentation/index.php/Understanding_DHT_Translator
>
> Document says that files on the failed node will not be visible to
> client. However, behavior of opened file handles is not described.
>
> I did couple of simple tests with cp and sha1 commands in order to see
> what what happens. Server configuration:
> Volume Name: test
> Type: Distribute
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: gluster1:/data
> Brick2: gluster2:/data
> Options Reconfigured:
> performance.stat-prefetch: off
> performance.write-behind-window-size: 4MB
> performance.io-thread-count: 8
> On client side I use default mount without any additional options.
>
> *File read*: Both cp and sha1 seem to read to the point when node fails
> and then exit without error. In case of sha1sum it reports incorrect
> hash and in case of cp it copies part of the file. In Gluster client
> logs I see errors indicating node failure, but commands doesn't report
> anything.
>
> *File write*: In case of write situation is slightly better as cp
> reports that endpoint is not connected and then fails:
> # cp testfile /gluster/; echo $?
> cp: writing `testfile': Transport endpoint is not connected
> cp: closing `testfile': Transport endpoint is not connected
> 1
>
> Another interesting detail is that in client log I see that file gets
> reopened when the storage node comes back online:
> [2011-04-16 14:03:04.909540] I
> [client-handshake.c:407:client3_1_reopen_cbk] test-client-1: reopen on
> /testfile succeeded (remote-fd = 0)
> [2011-04-16 14:03:04.909782] I
> [client-handshake.c:407:client3_1_reopen_cbk] test-client-1: reopen on
> /testfile succeeded (remote-fd = 1)
> However, command has already finished. What is the purpose of this reopen?
>
> Is this expected behavior? Could you please provide pointers to
> documentation if such exists?
>
> Is it possible to tune this behavior to be more NFS alike, i.e. put
> processes in IO wait until the node comes back?
>
> Thanks in advance

-- 
Emir Imamagic
www.srce.hr

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2853 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110512/4d109c71/attachment.p7s>