I've set up my clusters so that my backups are offsite, meaning that DC1 backups to DC3
So, you have no onsite backup and it backups live over WAN?
Or do you have dark fibers there? What is the bandwidth and latency between the datacenters?
Now, first and foremost, PBS performances across the network aren't good. That's a fact.
I'm surprised this works at all without backup jobs failing or machines freezing frequently.
Did direct backups over WAN before, but not at terabyte scale and it always had some quirks from time to time.
I'm surprised by the prices given by Dunuin for SSDs, but those would make SSD PBS way more affordable.
Take a look at mindfactory for example. Ignoring the 19% VAT, I get around 14,5k€ net for 12x PM9A3 15.36 TB there.
Asking an offer from a distributor will probably yield even better pricings.
But in general, NVMe is getting cheaper and cheaper, to a point where SATA SSDs are not even senseful anymore.
HDD is still cheaper at the same capacity, but the performance difference cannot be described.
Modern Datacenter NVMes just outrun whatever you throw at them and allow you to do things that would be mad on spinning disks.
Worth it!
About the RAID level, I'd personally go with a RAID50 (2x 6 disk) or one RAID6 (1x 12 disk), depending on the level of paranoia.
A rebuild is no big thing when your drives shovel gigabytes per second around, and technology like dRAID even further reduces the rebuild time.
two intel 10c/20t CPUs) with 64Gb of DDR4 RAM. If I do go with ZFS, I'll probably go for 128Gb instead.
Try to avoid Intel CPUs, I don't know if this has improved with the latest generation, but in general they are notoriously slow in PBS benchmarks, see here:
https://forum.proxmox.com/threads/how-fast-is-your-backup-datastore-benchmark-tool.72750/
To really utilize such high performing drives, you'll need some serious single core performance as well as some threads.
I'd recommend something like the AMD EPYC 9274F which is the latest generation and has massive clock speeds as well as most modern instruction sets. PBS relies heavily on cryptography, so does ZFS - checksums everywhere
As the backups chunks are encrypted and hashed at the client, your backup speed probably don't improve a lot, depending on the load of the current machines (i mean, its an Opteron), but Verify Jobs for example will greatly benefit from a modern and strong CPU.
About RAM, the rule of thumb when using ZFS is to have 1GB per 1TB pool capacity, so 128GB should be safe but since there are additional services like PBS itself, 196GB would'nt hurt and won't break the bank.
DC1 would backup to the SSD PBS in DC1, then that PBS would sync to the long-term storage in DC2 and DC3 for offsite backups
I'd not build 2 servers per DC.
And 2 offsite backups (so 3 backups) increases your storage requirements.
While being cheaper on their own, adding the HDD costs to the additional hardware and operational costs required for a whole separate box, I see the economic gains dwindling.
I'm thinking about 2 big PBS that remote sync each other and having DC3 and DC4 directly backup to whichever is available.
Eh, I'll attach a diagram. That way you have 2 backups of each DC, at 2 different locations.
If you don't already, consider using a single datastore with one namespace per DC, instead of one datastore per DC.
That way you can share the chunkstore across all datacenter backups and further improve deduplication rate.
This concept could work with 3 PBS (DC1,2,3) as well and with 2 offsite backups like you suggested, but at considerably higher costs then.
With 100TB each I'd roughly calculate about 20-30k for each PBS, depending on the margin added by your distributor.