PBS: Minimum Viable Product for production

engineer5

New Member
Jan 13, 2024
5
0
1
Dear all,

I've tried to find the relevant forum posts, but I may have missed just that one that explains my question. Apologies!

For a fairly small production cluster (3-4 nodes, 20 VMs, approx 30 TB CEPH SSD storage, 50% used max) I am looking into scaling a small bare metal for PBS.
I am looking into the following hardware:
  • 5 SATA HDDs (NAS quality, with each 12 or 18TB size) as ZFS zraid1 (= net storage 4*12=48 TB, or 4*18=72 TB), optional one hot spare.
  • 2 NVME devices for ZFS special device mirroring
  • middle-of-the-road CPU (6 core/12thread), preferably AMD
  • 48-64 GB RAM
  • (redundant) 10Gbit network interfaces
  • and a separate OS/boot SSD.
The primary aim of the PBS is disaster recovery. And, in case the cluster completely breaks down, we envision to have much more trouble on hand than the question how fast we can get the backups up and running again, so recovery speed is secondary concern.
The VMs are fairly stable, 10 will only need to be backed every 7 days, the other half every night.

I think I fairly understand the limitations of using spinning disks in a PBS ZFS configuration. That it is significantly slow. That a SSD special device or meta data is needed.

Please enlighten me:
1. I understand that the ZFS special dev needs to be on the same failure-resilience-level as the ZFS data device, so a zraid1 data device can be used with a mirrored special device?
2. I have no insight in scaling the special device. Is this possible with Intel Optane devices (32 GB size is the largest I seem to be able to grab)? Can ZFS work with this small space? Or how big will the special-device-SSDs need to be for the net storage?

All advice on right-sizing my solution is appreciated.

Thank you all!
 
1. I understand that the ZFS special dev needs to be on the same failure-resilience-level as the ZFS data device, so a zraid1 data device can be used with a mirrored special device?
Yes, you are highly recommended to use redundant special devices. If the metadata stored on the special device is lost, the data stored on the pool becomes inaccessible.

2. I have no insight in scaling the special device. Is this possible with Intel Optane devices (32 GB size is the largest I seem to be able to grab)? Can ZFS work with this small space? Or how big will the special-device-SSDs need to be for the net storage?

The size requirements of the special device depend on your workload and on how ZFS is configured. If you have many small files, such as produced by the chunk storage of a Proxmox Backup Server installation, you will need more storage on the special device compared to e.g. a NAS workload storing Linux ISOs or video files. Furthermore, ZFS can be configured to store files smaller than a configurable size threshold entirely on the special device, influencing storage requirements significantly.

So it's hard to give a concrete answer to this question. Some sources [1] suggest 0.3% of you pool capacity.
Based on that rule, 32GB are definitely not enough for a pool of your size (72TB * 0.003 ~ 220GB).

[1] https://forums.servethehome.com/ind...a-special-device-size-how-to-calculate.39454/
 
After consulting a colleague, who kindly provided some real world data of a large Proxmox Backup Server datastore on ZFS with special devices, I can say in this concrete case, the special device usages was 0.7% of the overall pool utilization. Personally, I'd round this up to 1-2% in order to have ample headroom for future pool expansions.

EDIT: default settings were used for the ZFS pool
 
Last edited:
I can confirm that none of my PBS have higher than 0'7% special device usage, with the typical value being around 0'4%. Typically use size 16k instead of 4k so some .fidx files fit in the special device too. If the special device fills up, data will be stored in the "normal" vdevs, so you won't lose any data, just some performance.

If you have a PBS with your backups, you can get an estimation of the needed space for the special device with these two commands:

- Count how many files at and below some size:

Code:
time find /PATH/ -type f -print0 | xargs -0 stat -c "%s" | awk '{ n=int(log($0)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'

- Get ZFS block size histogram:

Code:
zdb -Lbbbs BIGpool | grep -A20 'Block Size Histogram'

Be careful to not use the zpool with special device for other workloads than PBS, as they may make a different use of it and fill it sooner/faster than expected if using just PBS.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!