PBS setup on server with 4 x HD - recommended ZFS setup option

Fra

Renowned Member
Dec 10, 2011
144
12
83
I hope not to add noise here with a stupid question

Our current production Proxmox-pbs has 2 x Nvme for the pbs filesystems (ZFS RAID1), and 2 x 6TB SATA disks (ZFS RAID1) for the ZFS storage: we are running out of space.

We would like to upgrade to the relative cheap dedicated SX64 server at https://www.hetzner.com/dedicated-rootserver/matrix-sx which comes with 4 x 16 TB SATA Hard Disks (no RAID hardware).

I guess we should keep having two independent ZFS, with 2 HD disks each (ending up with two storage), or should consider using all four disks with another ZFS RAID option? like having the filesystems on top of a unique ZFS setup with all 4 disks?


Screenshot from 2022-01-04 07-20-24.png


Btw:

* we want to keep using ZFS: the server comes with plenty of RAM and we replicate content into another in-house server (sothing that is working great so far, and needs very low bandwidth for the daily sync).

* I think we cannot add two additional SSD disks on that server (for the pbs-filesystem)

* and... any tips on how to copy actual backup data to new server is more than welcome (both new and old server are in Hetzner) :)
 
Wow I see I can easily experiment it myself with a pbs as VM with 4x32G disks (SCSI as suggested in https://pve.proxmox.com/wiki/ZFS_on_Linux): trying to simulate HD failure, too.


So coming to choices:

* zfs (RAIDZ-3) needs at least 5 devices :)
* with zfs (RAIDZ-2) I end up with a total space of 4x32G (so that Fault tolerance)
* with zfs (RAIDZ-1) on all disks I end up again with 4x32G (so no Fault tolerance, again: but then I remove a disk, and the o.s. was fine even after reboot... a mistery for me!)
* with zfs (RAID1-0) the pool has two mirror, and the total size is 2x32G, I removed two disk (on per mirror) and was apparently ok o.s. but not ok after reboot.

I see I better dig in documentation, in the meanwhile any comment is more then welcome.
 
I'm using 4x4tb Raidz1 for PBS and it works okay. Sometimes the recovery is slow, but it is to be expected with a lot of vms.
 
* zfs (RAIDZ-3) needs at least 5 devices :)
* with zfs (RAIDZ-2) I end up with a total space of 4x32G (so that Fault tolerance)
* with zfs (RAIDZ-1) on all disks I end up again with 4x32G (so no Fault tolerance, again: but then I remove a disk, and the o.s. was fine even after reboot... a mistery for me!)
* with zfs (RAID1-0) the pool has two mirror, and the total size is 2x32G, I removed two disk (on per mirror) and was apparently ok o.s. but not ok after reboot.

First and last lines are correct, others not so much.:(

If you want?

- Max Performance > RAID10 ; max usable space 64 GB, 2 drives may fail (if in different mirror)

- Max Capacity > RAID-Z3Z1 ; max usable space 96 GB, 1 drive can fail

- Max Redundancy > RAID-Z2 ; max usable space 64 GB, 2 drives can fail

All based on your four 32 GB drives
 
Last edited:
- Max Capacity > RAID-Z3 ; max usable space 96 GB, 1 drive can fail
I guess you mean raidz1.

In other words (in case of 4 disks):
Usable capacity:IOPSThroughputDrives may fail
raidz1 (raid5)60%1x3x read, 3x write1
raidz2 (raid6)40%1x2x read, 2x write2
striped mirror (raid10)40%2x4x read, 2x write1-2

Usable capacity is only 40% instead of 50% and 60% instead of 75% because a ZFS pool always needs free space to operate so a pool should always have 20% free space left for best performance.

If you can add SSDs it might be useful to use them as a special device or L2ARC for metadata caching so GC tasks will be faster. Keep in mind that PBS was designed with SSDs in mind. Everything stored on the datastore will be small chunks of 4KB to 4MB in size. So if you got 25TB of usable HDD space that will mean that you got atleast 6.400.000 chunk files that need to be accessed each GC task and hashed on a re-verify task. HDDs can't handle this amout of IOPS very well and these tasks might take hours or days.
 
Last edited:
  • Like
Reactions: Fra
* zfs (RAIDZ-3) needs at least 5 devices :)
Perhaps the installer GUI enforces that (which also has another issue, like forcing same size disks instead of using the smallest.. but that is a different issue)

A RAIDZ3 with 4 disks are like having a 4way mirror, as in both cases you can loose 3xdisks and should function... just the RAIDZ3 might have a very high CPU overhead compared to the 4way mirror, thus it doesn't make sense at all and I won't be surprised if the RAIDZ3 implementer also just didn't bother with that corner case
 
  • Like
Reactions: Fra
A 4-way mirror (RAID10) may already break with a second disk failing!
just note that 4way RAID1 (ie, 4 mirrored copies of the same data) is different from a 4disk RAID10 (ie. a stripe of 2x 2disk mirrors)

The 4way RAID1 capable of surviving 3disk failures, while the 4 disk RAID10 is capable of surviving 2xdisk failures, just not the two making up a specific vdev else the whole pool lost ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!