[Gluster-users] Rsync

Stephan von Krawczynski skraw at ithnet.com
Tue Oct 6 15:39:11 UTC 2009


Remember, the gluster-team does not like my way of data-feeding. If your setup
blows up, don't blame them (or me :-)
I can only tell you what I am doing: simply move (or copy) the initial data to
the primary server of the replication setup and then start glusterfsd for
exporting.
You will  notice that the data gets replicated as soon as stat is going on
(first ls or the like). If you already exported the data via nfs before you
probably only need to setup up glusterfs on the very same box and use it as
primary server. Then there is no data copying at all.

After months of experiments I can say that glusterfs runs pretty stable on
_low_ performance setups. But you have to do one thing: lengthen the
ping-timeout (something like "option ping-timeout 120").
If you do not do that you will loose some of your server(s) at any time and
that will turn your glusterfs setup in a mess.
If your environment is ok, it works. If your environment fails it will fail,
too, sooner or later. In other words: it exports data, but it does not fulfill
the promise of keeping your setup alive during failures - at this stage.
My advice for the team is to stop whatever they may work on and take for
physical boxes (2 server, 2 client), run a lot of bonnies and unplug/re-plug 
the servers non-deterministic. You can find all kinds of weirdos this way.

Regards,
Stephan


On Mon, 5 Oct 2009 16:49:53 +0100
"Hiren Joshi" <josh at moonfruit.com> wrote:

> My users are more pitch fork less shooting.....
> 
> I don't understand what you're saying, should I have locally copied all
> the files over not using gluster before attempting an rsync?
> 
> > -----Original Message-----
> > From: Stephan von Krawczynski [mailto:skraw at ithnet.com] 
> > Sent: 05 October 2009 14:13
> > To: Hiren Joshi
> > Cc: Pavan Vilas Sondur; gluster-users at gluster.org
> > Subject: Re: [Gluster-users] Rsync
> > 
> > It would be nice to remember my thread about _not_ copying 
> > data initially to
> > gluster via the mountpoint. And one major reason for _local_ 
> > feed was: speed. 
> > Obviously a lot of cases are merely impossible because of the 
> > pure waiting
> > time. If you had a live setup people would have already shot you...
> > This is why I talked about a feature and not an accepted bug 
> > behaviour.
> > 
> > Regards,
> > Stephan
> > 
> > 
> > On Mon, 5 Oct 2009 11:00:36 +0100
> > "Hiren Joshi" <josh at moonfruit.com> wrote:
> > 
> > > Just a quick update: The rsync is *still* not finished. 
> > > 
> > > > -----Original Message-----
> > > > From: gluster-users-bounces at gluster.org 
> > > > [mailto:gluster-users-bounces at gluster.org] On Behalf Of 
> > Hiren Joshi
> > > > Sent: 01 October 2009 16:50
> > > > To: Pavan Vilas Sondur
> > > > Cc: gluster-users at gluster.org
> > > > Subject: Re: [Gluster-users] Rsync
> > > > 
> > > > Thanks!
> > > > 
> > > > I'm keeping a close eye on the "is glusterfs DHT really 
> > distributed?"
> > > > thread =)
> > > > 
> > > > I tried nodelay on and unhashd no. I tarred about 400G to 
> > the share in
> > > > about 17 hours (~6MB/s?) and am running an rsync now. 
> > Will post the
> > > > results when it's done.
> > > > 
> > > > > -----Original Message-----
> > > > > From: Pavan Vilas Sondur [mailto:pavan at gluster.com] 
> > > > > Sent: 01 October 2009 09:00
> > > > > To: Hiren Joshi
> > > > > Cc: gluster-users at gluster.org
> > > > > Subject: Re: Rsync
> > > > > 
> > > > > Hi,
> > > > > We're looking into the problem on similar setups and 
> > workng on it. 
> > > > > Meanwhile can you let us know if performance increases if you 
> > > > > use this option:
> > > > > 
> > > > > option transport.socket.nodelay on' in each of your
> > > > > protocol/client and protocol/server volumes.
> > > > > 
> > > > > Pavan
> > > > > 
> > > > > On 28/09/09 11:25 +0100, Hiren Joshi wrote:
> > > > > > Another update:
> > > > > > It took 1240 minutes (over 20 hours) to complete on 
> > the simplified
> > > > > > system (without mirroring). What else can I do to debug?
> > > > > > 
> > > > > > > -----Original Message-----
> > > > > > > From: gluster-users-bounces at gluster.org 
> > > > > > > [mailto:gluster-users-bounces at gluster.org] On Behalf Of 
> > > > > Hiren Joshi
> > > > > > > Sent: 24 September 2009 13:05
> > > > > > > To: Pavan Vilas Sondur
> > > > > > > Cc: gluster-users at gluster.org
> > > > > > > Subject: Re: [Gluster-users] Rsync
> > > > > > > 
> > > > > > >  
> > > > > > > 
> > > > > > > > -----Original Message-----
> > > > > > > > From: Pavan Vilas Sondur [mailto:pavan at gluster.com] 
> > > > > > > > Sent: 24 September 2009 12:42
> > > > > > > > To: Hiren Joshi
> > > > > > > > Cc: gluster-users at gluster.org
> > > > > > > > Subject: Re: Rsync
> > > > > > > > 
> > > > > > > > Can you let us know the following:
> > > > > > > > 
> > > > > > > >  * What is the exact directory structure?
> > > > > > > /abc/def/ghi/jkl/[1-4]
> > > > > > > now abc, def, ghi and jkl are one of a thousand dirs.
> > > > > > > 
> > > > > > > >  * How many files are there in each individual 
> > directory and 
> > > > > > > > of what size?
> > > > > > > Each of the [1-4] dirs has about 100 files in, all 
> > under 1MB.
> > > > > > > 
> > > > > > > >  * It looks like each server process has 6 export 
> > > > > > > > directories. Can you run one server process each 
> > for a single 
> > > > > > > > export directory and check if the rsync speeds up?
> > > > > > > I had no idea you could do that. How? Would I need to 
> > > > > create 6 config
> > > > > > > files and start gluster:
> > > > > > > 
> > > > > > > /usr/sbin/glusterfsd -f /etc/glusterfs/export1.vol 
> > or similar?
> > > > > > > 
> > > > > > > I'll give this a go....
> > > > > > > 
> > > > > > > >  * Also, do you have any benchmarks with a 
> > similar setup on 
> > > > > > > say, NFS?
> > > > > > > NFS will create the dir tree in about 20 minutes then start 
> > > > > > > copying the
> > > > > > > files over, it takes about 2-3 hours.
> > > > > > > 
> > > > > > > > 
> > > > > > > > Pavan
> > > > > > > > 
> > > > > > > > On 24/09/09 12:13 +0100, Hiren Joshi wrote:
> > > > > > > > > It's been running for over 24 hours now.
> > > > > > > > > Network traffic is nominal, top shows about 
> > 200-400% cpu 
> > > > > > > (7 cores so
> > > > > > > > > it's not too bad).
> > > > > > > > > About 14G of memory used (the rest is being used as 
> > > > > disk cache).
> > > > > > > > > 
> > > > > > > > > Thoughts?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > <snip>
> > > > > > > > > > > > > 
> > > > > > > > > > > > > An update, after running the rsync for a day, 
> > > > > I killed it 
> > > > > > > > > > > > and remounted
> > > > > > > > > > > > > all the disks (the underlying 
> > filesystem, not the 
> > > > > > > gluster) 
> > > > > > > > > > > > with noatime,
> > > > > > > > > > > > > the rsync completed in about 600 
> > minutes. I'm now 
> > > > > > > going to 
> > > > > > > > > > > > try one level
> > > > > > > > > > > > > up (about 1,000,000,000 dirs).
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > > From: Pavan Vilas Sondur 
> > > > [mailto:pavan at gluster.com] 
> > > > > > > > > > > > > > Sent: 23 September 2009 07:55
> > > > > > > > > > > > > > To: Hiren Joshi
> > > > > > > > > > > > > > Cc: gluster-users at gluster.org
> > > > > > > > > > > > > > Subject: Re: Rsync
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Hi Hiren,
> > > > > > > > > > > > > > What glusterfs version are you using? Can you 
> > > > > > > send us the 
> > > > > > > > > > > > > > volfiles and the log files.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Pavan
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On 22/09/09 16:01 +0100, Hiren Joshi wrote:
> > > > > > > > > > > > > > > I forgot to mention, the mount is 
> > mounted with 
> > > > > > > > > > > > direct-io, would this
> > > > > > > > > > > > > > > make a difference? 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > > > > From: gluster-users-bounces at gluster.org 
> > > > > > > > > > > > > > > > 
> > [mailto:gluster-users-bounces at gluster.org] On 
> > > > > > > > Behalf Of 
> > > > > > > > > > > > > > Hiren Joshi
> > > > > > > > > > > > > > > > Sent: 22 September 2009 11:40
> > > > > > > > > > > > > > > > To: gluster-users at gluster.org
> > > > > > > > > > > > > > > > Subject: [Gluster-users] Rsync
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Hello all,
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > I'm getting what I think is bizarre 
> > > > > > > > behaviour.... I have 
> > > > > > > > > > > > > > about 400G to
> > > > > > > > > > > > > > > > rsync (rsync -av) onto a gluster share, 
> > > > > the data is 
> > > > > > > > > > > > in a directory
> > > > > > > > > > > > > > > > structure which has about 1000 
> > directories 
> > > > > > > > per parent and 
> > > > > > > > > > > > > > about 1000
> > > > > > > > > > > > > > > > directories in each of them.
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > When I try to rsync an end leaf 
> > > > directory (this 
> > > > > > > > > > has about 4 
> > > > > > > > > > > > > > > > dirs and 100
> > > > > > > > > > > > > > > > files in each) the operation 
> > takes about 10 
> > > > > > > > > > seconds. When I 
> > > > > > > > > > > > > > > > go one level
> > > > > > > > > > > > > > > > above (1000 dirs with about 4 
> > dirs in each 
> > > > > > > > with about 100 
> > > > > > > > > > > > > > > > files in each)
> > > > > > > > > > > > > > > > the operation takes about 10 minutes.
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > Now, if I then go one level above that 
> > > > > (that's 1000 
> > > > > > > > > > > dirs with 
> > > > > > > > > > > > > > > > 1000 dirs
> > > > > > > > > > > > > > > > in each with about 4 dirs in each 
> > with about 
> > > > > > > > 100 files in 
> > > > > > > > > > > > > > each) the
> > > > > > > > > > > > > > > > operation takes days! Top shows 
> > glusterfsd 
> > > > > > > > takes 300-600% 
> > > > > > > > > > > > > > cpu usage
> > > > > > > > > > > > > > > > (2X4core), I have about 48G of memory 
> > > > > > > (usage is 0% as 
> > > > > > > > > > > > expected).
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > Has anyone seen anything like 
> > this? How can I 
> > > > > > > > speed it up?
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > > Josh.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > _______________________________________________
> > > > > > > > > > > > > > > Gluster-users mailing list
> > > > > > > > > > > > > > > Gluster-users at gluster.org
> > > > > > > > > > > > > > > 
> > > > > > > > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > Gluster-users mailing list
> > > > > > > > > > > Gluster-users at gluster.org
> > > > > > > > > > > 
> > > > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > > > > > > > > > > 
> > > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > Gluster-users mailing list
> > > > > > > Gluster-users at gluster.org
> > > > > > > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > > > > > > 
> > > > > 
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > > > 
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > > 
> > 
> > 
> 





More information about the Gluster-users mailing list