<div dir="ltr">Hi Xavier and thanks for your answers.<div><br><div>Servers will have 26*8TB disks.I don&#39;t want to loose more than 2 disk for raid,</div><div>so my options are HW RAID6 24+2 or 2 * HW RAID5 12+1,</div><div>in both cases I can create 2 bricks per server using LVM and use one brick</div><div>per server to create two distributed-disperse volumes. I will test those</div><div>configurations when servers arrive.</div><div><br></div><div>I can go with 8+1 or 16+2, will make tests when servers arrive. But 8+2 will</div><div>be too much, I lost nearly %25 space in this case.</div><div><br></div><div>For the client count, this cluster will get backups from hadoop nodes</div><div>so there will be 750-1000 clients at least which sends data at the same time.</div><div>Can 16+2 * 3 = 54 gluster nodes handle this or should I increase node count?</div><div><br></div><div>I will check the parameters you mentioned.</div><div><br></div><div>Serkan </div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Oct 13, 2015 at 1:43 PM, Xavier Hernandez <span dir="ltr">&lt;<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">+gluster-users<div class="HOEnZb"><div class="h5"><br>

<br>

On 13/10/15 12:34, Xavier Hernandez wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Serkan,<br>

<br>

On 12/10/15 16:52, Serkan Çoban wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi,<br>

<br>

I am planning to use GlusterFS for backup purposes. I write big files<br>

(&gt;100MB) with a throughput of 2-3GB/sn. In order to gain from space we<br>

plan to use erasure coding. I have some questions for EC and brick<br>

planning:<br>

- I am planning to use 200TB XFS/ZFS RAID6 volume to hold one brick per<br>

server. Should I increase brick count? is increasing brick count also<br>

increases performance?<br>

</blockquote>

<br>

Using a distributed-dispersed volume increases performance. You can<br>

split each RAID6 volume into multiple bricks to create such a volume.<br>

This is because a single brick process cannot achieve the maximum<br>

throughput of the disk, so creating multiple bricks improves this.<br>

However having too many bricks could be worse because all request will<br>

go to the same filesystem and will compete between them in your case.<br>

<br>

Another thing to consider is the size of the RAID volume. A 200TB RAID<br>

will require *a lot* of time to reconstruct in case of failure of any<br>

disk. Also, a 200 TB RAID means you need almost 30 8TB disks. A RAID6 of<br>

30 disks is quite fragile. Maybe it would be better to create multiple<br>

RAID6 volumes, each with 18 disks at most (16+2 is a good and efficient<br>

configuration, specially for XFS on non-hardware raids). Even in this<br>

configuration, you can create multiple bricks in each RAID6 volume.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- I plan to use 16+2 for EC. Is this a problem? Should I decrease this<br>

to 12+2 or 10+2? Or is it completely safe to use whatever we want?<br>

</blockquote>

<br>

16+2 is a very big configuration. It requires much computation power and<br>

forces you to grow (if you need to grow the gluster volume at some<br>

point) in multiples of 18 bricks.<br>

<br>

Considering that you are already using a RAID6 in your servers, what you<br>

are really protecting with the disperse redundancy is the failure of the<br>

servers themselves. Maybe a 8+1 configuration could be enough for your<br>

needs and requires less computation. If you really need redundancy 2,<br>

8+2 should be ok.<br>

<br>

Using values that are not a power of 2 has a theoretical impact on the<br>

performance of the disperse volume when applications write blocks whose<br>

size is a multiple of a power of 2 (which is the most normal case). This<br>

means that it&#39;s possible that a 10+2 performs worse than a 8+2. However<br>

this depends on many other factors, some even internal to gluster, like<br>

caching, meaning that the real impact could be almost negligible in some<br>

cases. You should test it with your workload.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- I understand that EC calculation is performed on client side, I want<br>

to know if there are any benchmarks how EC affects CPU usage? For<br>

example each 100MB/sn traffic may use 1CPU core?<br>

</blockquote>

<br>

I don&#39;t have a detailed measurement of CPU usage related to bandwidth,<br>

however we have made some tests that seem to indicate that the CPU<br>

overhead caused by disperse is quite small for a 4+2 configuration. I<br>

don&#39;t have access to this data right now. When I have it, I&#39;ll send it<br>

to you.<br>

<br>

I will also try to do some tests with a 8+2 and 16+2 configuration to<br>

see the difference.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- Is client number affect cluster performance? Is there any difference<br>

if I connect 100 clients each writing with 20-30MB/s to cluster vs 1000<br>

clients each writing 2-3MB/s?<br>

</blockquote>

<br>

Increasing the number of clients improves performance however I wont&#39; go<br>

over 100 clients as this could have a negative impact on performance<br>

caused by the overhead of managing all of them. In our tests, the<br>

maximum performance if obtained with ~8 parallel clients (if my memory<br>

doesn&#39;t fail).<br>

<br>

You will also probably want to tweak some volume parameters, like<br>

server.event-threads, client.event-threads,<br>

performance.client-io-threads and server.outstanding-rpc-limit to<br>

increase performance.<br>

<br>

Xavi<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Thank you for your time,<br>

Serkan<br>

<br>

<br>

_______________________________________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

<br>

</blockquote></blockquote>

</div></div></blockquote></div><br></div>