<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hrm, we have an 80T volume now, comprised over several RAID0, 2-disk
stripes; each drive 5TB SATA.<br>
<br>
That's on the commercial third-party Direct Connect instance; which
is not to say that we couldn't test something using a slew of EBS
volumes in some odd configuration. We're open to whatever at this
point.<br>
<br>
Amazon is offering EFS, but I'm not convinced yet that this will get
us the performance we need.<br>
<br>
Wouldn't FUSE in this configuration somewhere provide a performance
hit? I've been warned to stay away from FUSE, but I admit not to
having all the facts yet.<br>
<br>
<br>
Thank you.<br>
<br>
<br>
<div class="moz-cite-prefix">On 7/14/15 4:29 PM, Mathieu Chateau
wrote:<br>
</div>
<blockquote
cite="mid:CACpSnaJ58qX+Sy7ptoK5shpE4AHms_aOQhR3OpfrY=K0oqLyqA@mail.gmail.com"
type="cite">
<div dir="ltr">Hello,
<div><br>
</div>
<div>Ok you can stick with NFS, will just have to manage
failover if needed.</div>
<div><br>
</div>
<div>So they use 4TB hard drive (80TB/20 disks).</div>
<div>each disk can provide let's say 150 io/s max. that means
3000 io/s max, without raid cost & co.</div>
<div><br>
</div>
<div>From your explaination, I guess you have many workloads
running in parallel, and so 20 disks may not be enough anyway.</div>
<div><br>
</div>
<div>You first must be sure that storage can physically provide
your needs in terms or capacity and performance. </div>
<div><br>
</div>
<div>Then you can choose solution that fit best your needs.</div>
<div><br>
</div>
<div>just my 2cts</div>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div class="gmail_signature">Cordialement,<br>
Mathieu CHATEAU<br>
<a moz-do-not-send="true" href="http://www.lotp.fr"
target="_blank">http://www.lotp.fr</a></div>
</div>
<br>
<div class="gmail_quote">2015-07-14 22:21 GMT+02:00 Forrest
Aldrich <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:forrie@gmail.com" target="_blank">forrie@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> The instances we use
via Direct Connect (a third party company) have upwards 20
disks and a total of 80T. That part is covered.<br>
<br>
If we were to experiment with EBS, that would be a
different case as we'd need to stripe them.<br>
<br>
Our present model requires one single namespace via NFS.
The Instances are running CentOS 6.x. The Instances
mount the Direct Connect disk space via NFS, the only
other alternative we'd have is iSCSI which wouldn't work
for the level of sharing we need.<span class=""><br>
<br>
<br>
<br>
<br>
<div>On 7/14/15 4:18 PM, Mathieu Chateau wrote:<br>
</div>
</span>
<blockquote type="cite"><span class="">
<div dir="ltr">by NFS i think you just mean "all
servers seeing and changing sames files" ? That can
be done with fuse, without nfs.
<div>NFS is harder for failover while automatic with
fuse (no need for dynamic dns or virtual IP).</div>
<div><br>
</div>
<div>for redundancy I mean : What failure do you
want to survive ?</div>
<div>
<ul>
<li>Loosing a disk</li>
<li>Filesystem corrupt</li>
<li>Server lost or in maintenance</li>
<li>Whole region down</li>
</ul>
<div>Depending on your needs, then you may have to
replicate data accross gluster brick or even use
a geo dispersed brick.</div>
</div>
<div><br>
</div>
<div>Will network between your servers and node be
able to handle that traffic (380MB/s = 3040Mb/s) ?</div>
<div><br>
</div>
<div>I guess gluster can handle that load, you are
using big files and this is where gluster deliver
highest output. Nevertheless, you will need many
disk to provide these i/o, even more if using
replicated bricks.</div>
<div><br>
</div>
</div>
</span>
<div class="gmail_extra"><span class=""><br clear="all">
<div>
<div>Cordialement,<br>
Mathieu CHATEAU<br>
<a moz-do-not-send="true"
href="http://www.lotp.fr" target="_blank">http://www.lotp.fr</a></div>
</div>
<br>
</span>
<div class="gmail_quote"><span class="">2015-07-14
21:15 GMT+02:00 Forrest Aldrich <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:forrie@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:forrie@gmail.com">forrie@gmail.com</a></a>></span>:<br>
</span>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Sorry, I
should have noted that. 380MB is both read and
write (I confirmed this with a developer).<br>
<br>
We do need the NFS stack, as that's how all the
code and various many Instances work -- we have
several "workers" that chop up video on the same
namespace. It's not efficient, but that's how
it has to be for now.<br>
<br>
Redundancy, in terms of the server?  We have
RAIDED volumes if that's what you're referring
to.<span class=""><br>
<br>
Here's a basic outline of the flow (as I
understand it):<br>
<br>
<br>
Video Capture Agent sends in large file of
video (30gb +/-) <br>
<br>
Administrative host receives and writes to NFS<br>
<br>
A process copies this over to another point in
the namespace<br>
<br>
Another Instance picks up the file, reads and
starts processing and writes (FFMPEG is
involved)<br>
<br>
<br>
</span> Something like that -- I may not have
all the steps, but essentially there's a ton of
I/O going on.  I know our code model is not
efficient, but it's complicated and can't just
be changed (it's based on an open source product
and there's some code baggage).<span class=""><br>
<br>
We looked into another product that allegedly
scaled out using multiple NFS heads with
massive local cache (AWS instances) and
sharing the same space, but it was horrible
and just didn't work for us.<br>
<br>
<br>
<br>
Thank you. </span>
<div>
<div><span class=""><br>
<br>
<br>
<br>
<div>On 7/14/15 3:06 PM, Mathieu Chateau
wrote:<br>
</div>
</span>
<blockquote type="cite"><span class="">
<div dir="ltr">Hello,
<div><br>
</div>
<div>is it 380MB in read or write ?
What level of redundancy do you
need?</div>
<div>do you really need nfs stack or
just a mount point (and so be able
to use native gluster protocol) ?</div>
<div><br>
</div>
<div>Gluster load is mostly put on
clients, not server (clients do the
sync writes to all replica, and do
the memory cache)</div>
<div><br>
</div>
</div>
</span>
<div class="gmail_extra"><span class=""><br
clear="all">
<div>
<div>Cordialement,<br>
Mathieu CHATEAU<br>
<a moz-do-not-send="true"
href="http://www.lotp.fr"
target="_blank">http://www.lotp.fr</a></div>
</div>
<br>
</span>
<div class="gmail_quote"><span class="">2015-07-14
20:49 GMT+02:00 Forrest Aldrich <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:forrie@gmail.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:forrie@gmail.com">forrie@gmail.com</a></a>></span>:<br>
</span>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">I'm
exploring solutions to help us
achieve high throughput and
scalability within the AWS
environment.  Specifically, I
work in a department where we handle
and produce video content that
results in very large files (30GB
etc) that must be written to NFS,
chopped up and copied over on the
same mount (there are some odd
limits to the code we use, but
that's outside the scope of this
question).<br>
<br>
Currently, we're using a commercial
vendor with AWS, with dedicated
Direct Connect instances as the back
end to our production.  We're
maxing out at 350 to 380 MB/s which
is not enough. We expect our
capacity will double or even triple
when we bring on more classes or
even other entities and we need to
find a way to squeeze out as much
I/O as we can.<span class=""><br>
<br>
Our software model depends on NFS,
there's no way around that
presently.<br>
<br>
</span> Since GlusterFS uses FUSE,
I'm concerned about performance,
which is a key issue.  Sounds
like a STRIPE would be appropriate.<br>
<br>
My basic understanding of Gluster is
the ability to include several
"bricks" which could be multiples of
either dedicated EBS volumes or even
multiple instances of the above
commercial vendor, served up via NFS
namespace, which would be
transparently a single namespace to
client connections.  The I/O
could be distributed in this manner.<br>
<br>
I wonder if someone here with more
experience with the above might
elaborate on whether GlusterFS could
be used in the above scenario.
Specifically, performance I/O.Â
We'd really like to gain upwards as
much as possible, like 700 Mb/s and
1 GB/s and up if possible.<span
class=""><br>
<br>
<br>
<br>
Thanks in advance.<br>
<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
Gluster-users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org"
target="_blank">Gluster-users@gluster.org</a><br>
<a moz-do-not-send="true"
href="http://www.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</span></blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
<span class=""> <br>
_______________________________________________<br>
Gluster-users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org"
target="_blank">Gluster-users@gluster.org</a><br>
<a moz-do-not-send="true"
href="http://www.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</span></blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
<br>
_______________________________________________<br>
Gluster-users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a moz-do-not-send="true"
href="http://www.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>