It isn't the buying jbods that is the issue. It's having an empty cab nearby 2 years later to add more jbods to the existing pool. In 2 years when I have to deploy another jbod, that cab might be in another aisle and not something I can SAS connect to the existing pool.
Ah, I understand. Well, building multiple smaller servers that provide NFS shares may be a more appropriate solution then.
Or, as long as its feasible, upgrading the existing HDDs to bigger ones.
Also, building out huge pools with spinning disk is a bad thing.
Why? ZFS can handle dozens of disks properly and with new features like dRAID even rebuilds kann be quite painless, given that a proper raid level (RAIDZ-2 or 3) is chosen for that amount of disks.
I have datastore today with 30TB active on it, and the verify's take a long time to finish blocking backup jobs from running.
The more HDDs the better the verify jobs should run, because those are basically streaming data and comparing checksums (may be cpu heavy).
For Garbage Collections etc. Special Devices could do a good job.
About the Verify Job specifically, I remember seeing some git commits about removing chunk locks for readonly operations, the latest updates should allow you to do backup jobs running a verify job in parallel.
That is why ProxMox folks recommend building out datastores using SSD, which I disagree with btw, spinning disk is still cheaper then SSD's and last a lot longer then SSD's.
I agree, SSDs are still way more expensive than HDDs, but I also saw the massive advantage of an all-ssd PBS.
If one needs that kind of performance, in the end that depends on your RTO requirements.
I made good experiences using L2ARC drives (4MB recordsize on my pool, so its cheap on RAM), even with cheap consumer SSDs.
If I had a 5 TB pool, sure I'd do all SSD, but when you are planning 50 to 100TB of backup data and growing for PVE, SSD is a non starter.
If you plan to go 100+ TB, you will probably switch the storage implementation at some point, anyway.
ZFS for some time and then maybe a (erasure coded) ceph cluster when you start scaling out massively.