[SOLVED] BIG DATA ZFS (~200TB)

Baader-IT

Active Member
Oct 29, 2018
49
1
28
40
Hallo!

We are planing to create a BIG Data 2 Node Cluster with replication.
Our intention is to build it with ~200TB ZFS size.

This system should be designed for archiving data over more than 10 years.
There should only be 2 vm's on one node and the task of the second one is failover.

We want to start with ~ 80 TB for each vm in an LVM with XFS.

Should we place this vm on only one 80 TB disk or better use 4 disks with each 20 TB size.
Now I just want to know if there exists a maximum size of each virtual disk.
Are there any known performance issues?

Greetings
Tobi
 
Hi,

at the moment the GUI has a limit of 128T limit.
 
Hi @Baader-IT

I am curious how do you want design your this big zfs pool(ashift/vdev config)!?
How do you make the backups(rsync, etc) using what kind of network/s (Gbit?)?
What kind of files do you want to store(many small files, big files like video)?
Pre-production test(for each disk, for zfs/scrub/disk-replacement scenario)?

Good luck, you will need ;)
 
Last edited:
@ @guletz :)
-ashift 12
-backups - Only Replication to the other node
-Network: lwl 10GB direct cabeling between the two nodes
-Files: many small TAR Files
-pre-production test will be done soon
 
@ @guletz :)
-ashift 12
-backups - Only Replication to the other node
-Network: lwl 10GB direct cabeling between the two nodes
-Files: many small TAR Files
-pre-production test will be done soon

Let try again ;)

-how will be the pool(raidz2 with n disks, disks with sector 512 or 4K, or...)?
-how do you will you copy(only once, ore as a schedule task) on your pool this many small TAR files on the VM?

For many TAR files on a big zfs pool I will go using CT instead of VM! Even better without ANY VM/CT(using only some different zfs data-sets insted of VM/CT)
 
how will be the pool(raidz2 with n disks, disks with sector 512 or 4K, or...)?
raidz3 with 24 disks with sector 4k
-how do you will you copy(only once, ore as a schedule task) on your pool this many small TAR files on the VM?
We plan adding about 100 TAR files once a day in future.
We will not override or delete existing files.

For many TAR files on a big zfs pool I will go using CT instead of VM! Even better without ANY VM/CT(using only some different zfs data-sets insted of VM/CT)
We don't want to use CT's cause of our company policy.

Greetings
 
raidz3 with 24 disks with sector 4k

It will be a bad decision in my opinion with a poor performance: bad iops, very long time to finish a scrub task, bad replication speed to the second node (you can use even 1 Gbit link because you will not be able to use more then like one hdd bandwith), a long time to replace a single broken disk.

As a general principle, is not ok to have a big vdev (6-8 vdev is ok). So in your case will be better to use a 4 stripe of raidz2 (8 disks for each raidz2) + 2 spare disks.

Now let do some math:

iops in this case will be 4 x 1 disk iops (compared with 1 disk iops for your variant)

scrub will be also 4x faster

and any disk replacement will be faster than for your 24 x raidz3

Let take the vm case. For a 16k vblock size in vm, for each hdd we have:

24-3 (parity)= 21
16k / 21 = 0.76 k ... very lucky value
So because on any disk you can write only one block of 4 k at minimum(ashift 12) in your case for any VM write block you will write 21 x 4 k bloks. The same for reads. And think that you will do a simple ls -l on your multi bilion tar folder(random reads mostly). You have lucky maybe after a very very long time(one day ?), your pool will be able to finish ;)

So in my opinion you must think again how will be your zfs pool design(what about ashift 13, a multi tar archive with let say 4 GB size and so on)
 
  • Like
Reactions: Baader-IT
@guletz
Okay, we changed our config to test some other setup.
Using a 2GB Raid Controller created 2 Disks using 12 HDDs for each virtual disk (raid 6).

So we get out 2x 106TB virtual disks.

Then we created a zfs striped pool using these 2 disks.
Code:
zpool create  -f -o ashift=12 HDD-Pool sda sdb

Now we do not have used the zfs raidzX.

What do you think about this setup now?
We already can see huge performance increase between both setups. (+70% speed)
The available pool size is significantly higher than before. (also about 70% more available space than before)

Greetings and thanks for ur help!
 
in my opinion i suggest
first solution
1. >> 3 simple proxmox nodes cluster for guaranty quorum to keep active vms
2. >> master and slave redundancy network with serious network equipment
3. >> 2 storages (synology) active to active with minimum 40gb connected each other with serious datacenter disks
4. >> and last use one more server storage for disaster recover offline backup. thats all
///////////////////////////////////////////////////////////////////
second solution
use 4 node 24 disks ssd dc per node and combine all disks via ceph of course your network must be have minimun 40gb or better >> 100gb
 
Using a 2GB Raid Controller created 2 Disks using 12 HDDs for each virtual disk (raid 6).

Haven't you read the what-not-to-do-with-ZFS guide? Using hardware RAID is the first item. Worst decision for long time data storage, really. There is nothing worse than that. You need protection against silent data corruption in an archive server and hardware RAID breaks that.

@guletz is totally right. You really want to have as much raw power as possible, so as much vdevs as possible.

Having an archive server as a VM on ZFS is also not a good idea. Best would be to just use ZFS ON your archive server, so use the server itself as backend or create the ZFS inside of your VM (then on top of some RAID if you want) so that you can send/receive the actual files and not a zvol with another filesystem on top, but I'd suggest to go with plain ZFS as backend.

Another thing is to store tar files on top of ZFS. Yes, it's possible but really not what you want with backups that change a little but are mostly the same. I have no idea what you're storing there, but having the raw files and e.g. rsync with special ZFS-friendly options gets a CoW-friendly setup in which you can only store the difference and not complete sets of backups. Could you elaborate more on what you're going to store there?
 
  • Like
Reactions: guletz
Do not forget to check if the Filesystem is capable of storing so many files. I recommend you to test it before and not check only the theoretical limitations.
A customer from the datacenter where i work, has built an Backup server with around 60TB storage, they copied all of his system with rsync on it (not good choose) and the ext4 was not capable of storing these files. So the customer has many problems with this server, we have checked and replaced many parts until we find out how the store the data. He changed the way for storing backups and now all is okay.
 
use 4 node 24 disks ssd dc per node and combine all disks via ceph of course your network must be have minimun 40gb or better >> 100gb


Read again, @Baader-IT want a backup only system, in the same location. In this case he need only a decent write/read performance, because he will need to restore some of his tar archive, but not all. So in this case your solution is too much for his goals.
 
Hi @ All,

we tested the solution with our hardware RAID controller.
This solution is perfect for us.
We do not have any overhead (file size) and the performance is sufficient for our purposes.

We know that the hardware RAID controller shouldn't be used but on all our systems we count on it.
That's why we are now using the RAID controller.

@guletz - THX for your first comment, we didn't saw the big overhead but solved it now :)
 
We know that the hardware RAID controller shouldn't be used but on all our systems we count on it.
That's why we are now using the RAID controller.

It still is a bad idea. You're not using the features ZFS is offering AND risking a raid controller hardware failure.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!