Seagate has just publicly announced 8TB HDD’s in a 3.5″ form factor. I decided to do some rough calculations to understand the density a bit better…
Note: I have decided to ignore the distinction between Terabytes (TB) and Tebibytes (TiB), since I always work in base 2, but I hate the -bi naming conventions. Seagate is most likely announcing an 8TB HDD, which is actually smaller than a true 8TiB drive. If you don’t know the difference it’s worth learning.
Rack Unit Density:
Supermicro sells a high density, double-sided 4U server, which can hold 90 x 3.5″ drives. This means you can easily store:
90 * 8TB = 720TB in 4U,
or:
720TB/4U = 180TB per U.
To store a petabyte of data, since:
1PB = 1024TB,
we need:
1024TB/180TB/U = 5.68 U.
Rounding up we realize that we can easily store one petabyte of raw data in 6U.
Since an average rack is usually 42U (tall racks can be 48U) that means we can store between seven and eight PB per rack:
42U/rack / 6U/PB = 7PB/rack
48U/rack / 6U/PB = 8PB/rack
If you can provide the power and cooling, you can quickly see that small data centers can easily get into exabyte scale if needed. One raw exabyte would only require:
1EB = 1024PB
1024PB/7PB/rack = 146 racks =~ 150 racks.
Raid and Redundancy:
Since you’ll most likely have lots of failures, I would recommend having some number of RAID sets per server, and perhaps a distributed file system like GlusterFS to replicate the data across different servers. Suppose you broke each 90 drive server into five separate RAID 6 bricks for GlusterFS:
90/5 = 18 drives per brick.
In RAID 6, you loose two drives to parity, so that means:
18 drives – 2 drives = 16 drives per brick of usable storage.
16 drives * 5 bricks * 8 TB = 640 TB after RAID 6 in 4U.
640TB/4U = 160TB/U
1024TB/160TB/U = 6.4TB/U =~ 7PB/rack.
Since I rounded a lot, the result is similar. With a replica count of 2 in a standard GlusterFS configuration, you average a total of about 3-4PB of usable storage per rack. Need a petabyte scale filesystem? One rack should do it!
Other considerations:
Conclusions:
Storage is getting very inexpensive. After the above analysis, I feel safe in concluding that:
Hope this was fun,
Happy hacking,
James
Disclaimer: I have not tried the 8TB Seagate HDD’s, or the Supermicro 90 x 3.5″ servers, but if you are building a petabyte scale cluster with GlusterFS/Puppet-Gluster, I’d like to hear about it!
2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...
It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...
The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...