[Gluster-users] Recovery from network failure

Anand Avati anand.avati at gmail.com
Mon Oct 5 15:07:28 UTC 2009


George,
  can you run the client in debug mode when you run ls -l on that
filename with the intention of self-healing it, and post the logs?

Avati

On 10/5/09, Georgecooldude <georgecooldude at gmail.com> wrote:
> Hi Guys,
>
>  Any ideas what might be causing this? I'm looking to deploy my servers soon
>  and if I can resolve this issue will be using Gluster. If not I'll have to
>  go with an alternative.
>
>  Any help much appreciated.
>
>
>
>  On Sat, Sep 26, 2009 at 10:55 PM, Georgecooldude
>
> <georgecooldude at gmail.com>wrote:
>
>  > Does anyone have any ideas? This is a real gluster show stopper for me atm
>  > :(
>  >
>  >
>  > On Wed, Sep 23, 2009 at 9:41 PM, Georgecooldude <georgecooldude at gmail.com>wrote:
>  >
>  >> It does seem to detect it in the log.
>  >>
>  >> This is what and did and the attached log file
>  >>
>  >> ###
>  >> SRV01 - 192.168.1.1
>  >> SRV02 - 192.168.1.2
>  >> ##
>  >>
>  >>
>  >> ------------------------
>  >> Step 1: Copy large file to the gluster mount on server02
>  >> admin at srv02:/mnt/glusterfs$ ls -lh
>  >> total 1.2G
>  >> -rw-r--r-- 1 root root 584M 2009-09-23 21:29 test03
>  >> ------------------------
>  >>
>  >> ------------------------
>  >> Step 2: Pull the cable from srv02
>  >> ------------------------
>  >>
>  >> ------------------------
>  >> Step 3: ls on srv01 - See I have a partial file
>  >> admin at srv01:/mnt/glusterfs$ ls -lh
>  >> total 775M
>  >> -rw-r--r-- 1 root root 191M 2009-09-23 21:28 test03
>  >> ------------------------
>  >>
>  >> ------------------------
>  >> Server02 Log file looks like this:
>  >> Version      : glusterfs 2.0.6 built on Sep 19 2009 18:00:37
>  >> TLA Revision : v2.0.6
>  >> Starting Time: 2009-09-23 21:26:49
>  >> Command line : glusterfsd -f /etc/glusterfs/glusterfs-server.vol -l
>  >> /var/log/gluster/gluster-log.txt -L DEBUG --volfile-check
>  >> PID          : 5085
>  >> System name  : Linux
>  >> Nodename     : srv02
>  >> Kernel Release : 2.6.24-24-server
>  >> Hardware Identifier: x86_64
>  >> Given volfile:
>  >>
>  >> +------------------------------------------------------------------------------+
>  >>   1: # file: /etc/glusterfs/glusterfs-server.vol
>  >>   2:
>  >>   3: volume posix
>  >>   4:   type storage/posix
>  >>   5:   option directory /data/export
>  >>   6: end-volume
>  >>   7:
>  >>   8: volume locks
>  >>   9:   type features/locks
>  >>  10:   subvolumes posix
>  >>  11: end-volume
>  >>  12:
>  >>  13: volume brick
>  >>  14:   type performance/io-threads
>  >>  15:   option thread-count 8
>  >>  16:   subvolumes locks
>  >>  17: end-volume
>  >>  18:
>  >>  19: volume posix-ns
>  >>  20:   type storage/posix
>  >>  21:   option directory /data/export-ns
>  >>  22: end-volume
>  >>  23:
>  >>  24: volume locks-ns
>  >>  25:   type features/locks
>  >>  26:   subvolumes posix-ns
>  >>  27: end-volume
>  >>  28:
>  >>  29: volume brick-ns
>  >>  30:   type performance/io-threads
>  >>  31:   option thread-count 8
>  >>  32:   subvolumes locks-ns
>  >>  33: end-volume
>  >>  34:
>  >>  35: volume server
>  >>  36:   type protocol/server
>  >>  37:   option transport-type tcp
>  >>  38:   option auth.addr.brick.allow *
>  >>  39:   option auth.addr.brick-ns.allow *
>  >>  40:   subvolumes brick brick-ns
>  >>  41: end-volume
>  >>
>  >> +------------------------------------------------------------------------------+
>  >> [2009-09-23 21:26:49] D [glusterfsd.c:1205:main] glusterfs: running in pid
>  >> 5085
>  >> [2009-09-23 21:26:49] D [io-threads.c:2280:init] brick: io-threads:
>  >> Autoscaling: off, min_threads: 8, max_threads: 8
>  >> [2009-09-23 21:26:49] D [io-threads.c:2280:init] brick-ns: io-threads:
>  >> Autoscaling: off, min_threads: 8, max_threads: 8
>  >> [2009-09-23 21:26:49] D [transport.c:141:transport_load] transport:
>  >> attempt to load file /usr/local/lib/glusterfs/2.0.6/transport/socket.so
>  >> [2009-09-23 21:26:49] N [glusterfsd.c:1224:main] glusterfs: Successfully
>  >> started
>  >> [2009-09-23 21:26:56] D [addr.c:174:gf_auth] brick-ns: allowed = "*",
>  >> received addr = "192.168.1.2"
>  >> [2009-09-23 21:26:56] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.2:1021
>  >> [2009-09-23 21:26:56] D [addr.c:174:gf_auth] brick-ns: allowed = "*",
>  >> received addr = "192.168.1.2"
>  >> [2009-09-23 21:26:56] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.2:1020
>  >> [2009-09-23 21:26:56] D [addr.c:174:gf_auth] brick: allowed = "*",
>  >> received addr = "192.168.1.2"
>  >> [2009-09-23 21:26:56] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.2:1017
>  >> [2009-09-23 21:26:56] D [addr.c:174:gf_auth] brick: allowed = "*",
>  >> received addr = "192.168.1.2"
>  >> [2009-09-23 21:26:56] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.2:1016
>  >> [2009-09-23 21:27:16] D [addr.c:174:gf_auth] brick: allowed = "*",
>  >> received addr = "192.168.1.1"
>  >> [2009-09-23 21:27:16] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.1:1021
>  >> [2009-09-23 21:27:17] D [addr.c:174:gf_auth] brick-ns: allowed = "*",
>  >> received addr = "192.168.1.1"
>  >> [2009-09-23 21:27:17] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.1:1020
>  >> [2009-09-23 21:27:17] D [addr.c:174:gf_auth] brick: allowed = "*",
>  >> received addr = "192.168.1.1"
>  >> [2009-09-23 21:27:17] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.1:1017
>  >> [2009-09-23 21:27:17] D [addr.c:174:gf_auth] brick-ns: allowed = "*",
>  >> received addr = "192.168.1.1"
>  >> [2009-09-23 21:27:17] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.1:1016
>  >> [2009-09-23 21:29:21] N [server-protocol.c:7816:notify] server:
>  >> 192.168.1.1:1021 disconnected
>  >> [2009-09-23 21:29:21] N [server-protocol.c:7816:notify] server:
>  >> 192.168.1.1:1020 disconnected
>  >> [2009-09-23 21:29:37] N [server-protocol.c:7816:notify] server:
>  >> 192.168.1.1:1017 disconnected
>  >> [2009-09-23 21:29:37] D [socket.c:1298:socket_submit] server: not
>  >> connected (priv->connected = 255)
>  >> [2009-09-23 21:29:37] N [server-helpers.c:779:server_connection_destroy]
>  >> server: destroyed connection of srv01-5127-2009/09/23-20:52:02:522004-brick2
>  >> [2009-09-23 21:29:37] N [server-protocol.c:7816:notify] server:
>  >> 192.168.1.1:1016 disconnected
>  >> [2009-09-23 21:29:37] N [server-helpers.c:779:server_connection_destroy]
>  >> server: destroyed connection of
>  >> srv01-5127-2009/09/23-20:52:02:522004-brick2-ns
>  >> [2009-09-23 21:29:40] D [addr.c:174:gf_auth] brick: allowed = "*",
>  >> received addr = "192.168.1.1"
>  >> [2009-09-23 21:29:40] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.1:1015
>  >> [2009-09-23 21:29:40] D [addr.c:174:gf_auth] brick: allowed = "*",
>  >> received addr = "192.168.1.1"
>  >> [2009-09-23 21:29:40] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.1:1014
>  >> [2009-09-23 21:29:40] D [addr.c:174:gf_auth] brick-ns: allowed = "*",
>  >> received addr = "192.168.1.1"
>  >> [2009-09-23 21:29:40] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.1:1013
>  >> [2009-09-23 21:29:40] D [addr.c:174:gf_auth] brick-ns: allowed = "*",
>  >> received addr = "192.168.1.1"
>  >> [2009-09-23 21:29:40] N [server-protocol.c:7056:mop_setvolume] server:
>  >> accepted client from 192.168.1.1:1012
>  >> ------------------------
>  >> No matter how many times I -ls the directory or file I cannot get it to
>  >> sync.
>  >>
>  >> I can rename the files and have the name changes sync. Just not the files
>  >> themselves.
>  >>
>  >> admin at srv02:/mnt/glusterfs$ ls -lh
>  >> -rw-r--r-- 1 root root 584M 2009-09-23 21:29 test03
>  >> admin at srv02:/mnt/glusterfs$ mv test03 test03a
>  >>
>  >> admin at srv01:/mnt/glusterfs$ ls -lh (on server02 now)
>  >> -rw-r--r-- 1 root root 191M 2009-09-23 21:28 test03a
>  >>
>  >>
>  >> Any ideas what I might be doing wrong?
>  >>
>  >>
>  >>
>  >>
>  >> On Wed, Sep 23, 2009 at 5:55 AM, Anand Avati <avati at gluster.com> wrote:
>  >>
>  >>> On 9/23/09, Georgecooldude <georgecooldude at gmail.com> wrote:
>  >>> > Anyone have any ideas on the below? Thanks.
>  >>> >
>  >>>
>  >>> Does the logfile of the server whose cable you pulled out, recognize
>  >>> the disconnection from the client?
>  >>>
>  >>> Avati
>  >>>
>  >>
>  >>
>  >
>
> _______________________________________________
>  Gluster-users mailing list
>  Gluster-users at gluster.org
>  http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>



More information about the Gluster-users mailing list