[SOLVED] BIG DATA ZFS (~200TB)

Discussion in 'Proxmox VE: Installation and configuration' started by Baader-IT, Jan 18, 2019.

  1. Baader-IT

    Baader-IT New Member
    Proxmox Subscriber

    Joined:
    Oct 29, 2018
    Messages:
    27
    Likes Received:
    1
    Hallo!

    We are planing to create a BIG Data 2 Node Cluster with replication.
    Our intention is to build it with ~200TB ZFS size.

    This system should be designed for archiving data over more than 10 years.
    There should only be 2 vm's on one node and the task of the second one is failover.

    We want to start with ~ 80 TB for each vm in an LVM with XFS.

    Should we place this vm on only one 80 TB disk or better use 4 disks with each 20 TB size.
    Now I just want to know if there exists a maximum size of each virtual disk.
    Are there any known performance issues?

    Greetings
    Tobi
     
  2. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,709
    Likes Received:
    313
    Hi,

    at the moment the GUI has a limit of 128T limit.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. Baader-IT

    Baader-IT New Member
    Proxmox Subscriber

    Joined:
    Oct 29, 2018
    Messages:
    27
    Likes Received:
    1
    Hi Wolfgang,

    So it is not possible to create a virtual disk with a size over 128 TB right?

    Greeting Tobi
     
  4. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,709
    Likes Received:
    313
    Not over the GUI.

    You can create larger Disk over the CLI.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. Baader-IT

    Baader-IT New Member
    Proxmox Subscriber

    Joined:
    Oct 29, 2018
    Messages:
    27
    Likes Received:
    1
    Wolfgang, do you have any infos for this questions ?
     
  6. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    864
    Likes Received:
    115
    Hi @Baader-IT

    I am curious how do you want design your this big zfs pool(ashift/vdev config)!?
    How do you make the backups(rsync, etc) using what kind of network/s (Gbit?)?
    What kind of files do you want to store(many small files, big files like video)?
    Pre-production test(for each disk, for zfs/scrub/disk-replacement scenario)?

    Good luck, you will need ;)
     
    #6 guletz, Jan 18, 2019
    Last edited: Jan 18, 2019
  7. Baader-IT

    Baader-IT New Member
    Proxmox Subscriber

    Joined:
    Oct 29, 2018
    Messages:
    27
    Likes Received:
    1
    @ @guletz :)
    -ashift 12
    -backups - Only Replication to the other node
    -Network: lwl 10GB direct cabeling between the two nodes
    -Files: many small TAR Files
    -pre-production test will be done soon
     
  8. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    864
    Likes Received:
    115
    Let try again ;)

    -how will be the pool(raidz2 with n disks, disks with sector 512 or 4K, or...)?
    -how do you will you copy(only once, ore as a schedule task) on your pool this many small TAR files on the VM?

    For many TAR files on a big zfs pool I will go using CT instead of VM! Even better without ANY VM/CT(using only some different zfs data-sets insted of VM/CT)
     
  9. Baader-IT

    Baader-IT New Member
    Proxmox Subscriber

    Joined:
    Oct 29, 2018
    Messages:
    27
    Likes Received:
    1
    raidz3 with 24 disks with sector 4k
    We plan adding about 100 TAR files once a day in future.
    We will not override or delete existing files.

    We don't want to use CT's cause of our company policy.

    Greetings
     
  10. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    864
    Likes Received:
    115
    It will be a bad decision in my opinion with a poor performance: bad iops, very long time to finish a scrub task, bad replication speed to the second node (you can use even 1 Gbit link because you will not be able to use more then like one hdd bandwith), a long time to replace a single broken disk.

    As a general principle, is not ok to have a big vdev (6-8 vdev is ok). So in your case will be better to use a 4 stripe of raidz2 (8 disks for each raidz2) + 2 spare disks.

    Now let do some math:

    iops in this case will be 4 x 1 disk iops (compared with 1 disk iops for your variant)

    scrub will be also 4x faster

    and any disk replacement will be faster than for your 24 x raidz3

    Let take the vm case. For a 16k vblock size in vm, for each hdd we have:

    24-3 (parity)= 21
    16k / 21 = 0.76 k ... very lucky value
    So because on any disk you can write only one block of 4 k at minimum(ashift 12) in your case for any VM write block you will write 21 x 4 k bloks. The same for reads. And think that you will do a simple ls -l on your multi bilion tar folder(random reads mostly). You have lucky maybe after a very very long time(one day ?), your pool will be able to finish ;)

    So in my opinion you must think again how will be your zfs pool design(what about ashift 13, a multi tar archive with let say 4 GB size and so on)
     
    Baader-IT likes this.
  11. Baader-IT

    Baader-IT New Member
    Proxmox Subscriber

    Joined:
    Oct 29, 2018
    Messages:
    27
    Likes Received:
    1
    @guletz
    Okay, we changed our config to test some other setup.
    Using a 2GB Raid Controller created 2 Disks using 12 HDDs for each virtual disk (raid 6).

    So we get out 2x 106TB virtual disks.

    Then we created a zfs striped pool using these 2 disks.
    Code:
    zpool create  -f -o ashift=12 HDD-Pool sda sdb
    Now we do not have used the zfs raidzX.

    What do you think about this setup now?
    We already can see huge performance increase between both setups. (+70% speed)
    The available pool size is significantly higher than before. (also about 70% more available space than before)

    Greetings and thanks for ur help!
     
  12. noname

    noname New Member

    Joined:
    May 14, 2014
    Messages:
    9
    Likes Received:
    0
    in my opinion i suggest
    first solution
    1. >> 3 simple proxmox nodes cluster for guaranty quorum to keep active vms
    2. >> master and slave redundancy network with serious network equipment
    3. >> 2 storages (synology) active to active with minimum 40gb connected each other with serious datacenter disks
    4. >> and last use one more server storage for disaster recover offline backup. thats all
    ///////////////////////////////////////////////////////////////////
    second solution
    use 4 node 24 disks ssd dc per node and combine all disks via ceph of course your network must be have minimun 40gb or better >> 100gb
     
  13. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,696
    Likes Received:
    331
    Haven't you read the what-not-to-do-with-ZFS guide? Using hardware RAID is the first item. Worst decision for long time data storage, really. There is nothing worse than that. You need protection against silent data corruption in an archive server and hardware RAID breaks that.

    @guletz is totally right. You really want to have as much raw power as possible, so as much vdevs as possible.

    Having an archive server as a VM on ZFS is also not a good idea. Best would be to just use ZFS ON your archive server, so use the server itself as backend or create the ZFS inside of your VM (then on top of some RAID if you want) so that you can send/receive the actual files and not a zvol with another filesystem on top, but I'd suggest to go with plain ZFS as backend.

    Another thing is to store tar files on top of ZFS. Yes, it's possible but really not what you want with backups that change a little but are mostly the same. I have no idea what you're storing there, but having the raw files and e.g. rsync with special ZFS-friendly options gets a CoW-friendly setup in which you can only store the difference and not complete sets of backups. Could you elaborate more on what you're going to store there?
     
    guletz likes this.
  14. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    864
    Likes Received:
    115
    Is not ok, see also the @LnxBil pertinent comments (as usually) . And as you give more details, you have a better chances to find a better solution for your setup.
     
  15. sb-jw

    sb-jw Active Member

    Joined:
    Jan 23, 2018
    Messages:
    483
    Likes Received:
    41
    Do not forget to check if the Filesystem is capable of storing so many files. I recommend you to test it before and not check only the theoretical limitations.
    A customer from the datacenter where i work, has built an Backup server with around 60TB storage, they copied all of his system with rsync on it (not good choose) and the ext4 was not capable of storing these files. So the customer has many problems with this server, we have checked and replaced many parts until we find out how the store the data. He changed the way for storing backups and now all is okay.
     
  16. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    864
    Likes Received:
    115

    Read again, @Baader-IT want a backup only system, in the same location. In this case he need only a decent write/read performance, because he will need to restore some of his tar archive, but not all. So in this case your solution is too much for his goals.
     
  17. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    864
    Likes Received:
    115
    Good point. For this reson I mention about a bigger archive files ...
     
  18. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,696
    Likes Received:
    331
    That is also one reason to just use plain ZFS, there are (worldly) limits :-D
     
  19. Baader-IT

    Baader-IT New Member
    Proxmox Subscriber

    Joined:
    Oct 29, 2018
    Messages:
    27
    Likes Received:
    1
    Hi @ All,

    we tested the solution with our hardware RAID controller.
    This solution is perfect for us.
    We do not have any overhead (file size) and the performance is sufficient for our purposes.

    We know that the hardware RAID controller shouldn't be used but on all our systems we count on it.
    That's why we are now using the RAID controller.

    @guletz - THX for your first comment, we didn't saw the big overhead but solved it now :)
     
  20. morph027

    morph027 Active Member

    Joined:
    Mar 22, 2013
    Messages:
    409
    Likes Received:
    50
    It still is a bad idea. You're not using the features ZFS is offering AND risking a raid controller hardware failure.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice