PBS scaling out storage

tjk

Member
May 3, 2021
112
14
23
How do you scale out storage once a PBS node starts to run out of space?

For example, I setup a PBS server with 20TB of backup space...in 2 years I start to hit 80/90% of that capacity, what is the best way to add more storage to the PBS server and use that for existing backup jobs?

It would be cool if you had a concept of scale out repositories - where you just present a bunch of datastores to a set of backup jobs and it load balances/uses them across jobs.
 
We use ZFS pools and more disks to scale out, which works like a charm.
I assume you are presenting datastores as nfs mount points then? You just grow the NFS mount point as you need more space on a datastore?
 
I assume you are presenting datastores as nfs mount points then? You just grow the NFS mount point as you need more space on a datastore?
No. We create a filesystem per datastore, locally on the local ZFS pool. We set a quota per user, each user has its own main filesystem and a per-datastore sub-filesystem.
 
No. We create a filesystem per datastore, locally on the local ZFS pool. We set a quota per user, each user has its own main filesystem and a per-datastore sub-filesystem.
Yea, except this is exactly the problem I am describing. How do you keep adding disk to a server that has fixed disk capacity?

Also, adding disk to a ZFS pool isn't the most efficient since it doesn't rebalance the existing data on the disk.
 
You add external cabinets where you can add disks..

Indeed, growing a ZFS pool does not rebalance data, but that does not cause any issues so far. There is some data that is being deleted and some new data being added. In the long run, this balances enough.
 
Growing a ZFS pool sounds like a good solution to me, for quite some time. One can add a hell lot of disks using some JBODs.
And nowadays there are pretty large HDDs, too.

If that doesn't fit, building a ceph cluster could work for further scaling, but if you expect to reach 20TB in like 2-3 years, that'd be overkill imho.
In some years, there are probably even larger HDDs and you can simply replace some of your current disks and gain storage that way - pro: no rebalance necessary in that case.

If you want to go crazy and keep it all in one shelf no matter what for a very long time, there are several 90-bay cases available from Supermicro for example and manufacturers like NimbusData offer SSDs up to 100TB each.

PBS also supports LTO tape libraries, those can also store lots of data in a single chassis.
 
We don't have the luxury of having empty cabs to just put up more jbods and chain em together, plus that only scales so far.

I think we'll stick to using NFS mounts for datastores.
 
You can buy and deploy those JBODs on demand, no need to keep empty cabs around.
If you are fine with multiple datasources (NFS share per datastore) anyway, you have lots of additional possibilities anyway.
It sounded like you really want to have one single machine serve all your datastores, at all time.

Beside that, 20TB is not that much. I'm getting a fresh PBS machine with 8x 10TB HDDs - using striped mirrors (RAID-10) and reserving 20% to not overload the ZFS pool, I still end up with about 30 TiB of useable storage.
So, I don't really see a problem for your usecase, as HDDs already go to like 16 or 18TB per Disk nowadays.
 
It isn't the buying jbods that is the issue. It's having an empty cab nearby 2 years later to add more jbods to the existing pool. In 2 years when I have to deploy another jbod, that cab might be in another aisle and not something I can SAS connect to the existing pool.

Also, building out huge pools with spinning disk is a bad thing. I have datastore today with 30TB active on it, and the verify's take a long time to finish blocking backup jobs from running.

That is why ProxMox folks recommend building out datastores using SSD, which I disagree with btw, spinning disk is still cheaper then SSD's and last a lot longer then SSD's.

If I had a 5 TB pool, sure I'd do all SSD, but when you are planning 50 to 100TB of backup data and growing for PVE, SSD is a non starter.
 
It isn't the buying jbods that is the issue. It's having an empty cab nearby 2 years later to add more jbods to the existing pool. In 2 years when I have to deploy another jbod, that cab might be in another aisle and not something I can SAS connect to the existing pool.
Ah, I understand. Well, building multiple smaller servers that provide NFS shares may be a more appropriate solution then.
Or, as long as its feasible, upgrading the existing HDDs to bigger ones.

Also, building out huge pools with spinning disk is a bad thing.
Why? ZFS can handle dozens of disks properly and with new features like dRAID even rebuilds kann be quite painless, given that a proper raid level (RAIDZ-2 or 3) is chosen for that amount of disks.

I have datastore today with 30TB active on it, and the verify's take a long time to finish blocking backup jobs from running.
The more HDDs the better the verify jobs should run, because those are basically streaming data and comparing checksums (may be cpu heavy).
For Garbage Collections etc. Special Devices could do a good job.

About the Verify Job specifically, I remember seeing some git commits about removing chunk locks for readonly operations, the latest updates should allow you to do backup jobs running a verify job in parallel.

That is why ProxMox folks recommend building out datastores using SSD, which I disagree with btw, spinning disk is still cheaper then SSD's and last a lot longer then SSD's.
I agree, SSDs are still way more expensive than HDDs, but I also saw the massive advantage of an all-ssd PBS.
If one needs that kind of performance, in the end that depends on your RTO requirements.
I made good experiences using L2ARC drives (4MB recordsize on my pool, so its cheap on RAM), even with cheap consumer SSDs.

If I had a 5 TB pool, sure I'd do all SSD, but when you are planning 50 to 100TB of backup data and growing for PVE, SSD is a non starter.
If you plan to go 100+ TB, you will probably switch the storage implementation at some point, anyway.
ZFS for some time and then maybe a (erasure coded) ceph cluster when you start scaling out massively.
 
Last edited:
  • Like
Reactions: tjk
Why? ZFS can handle dozens of disks properly and with new features like dRAID even rebuilds kann be quite painless, given that a proper raid level (RAIDZ-2 or 3) is chosen for that amount of disks.
Good question for the PVE team, they seem to think SSD for datastores is the way to go, and that doesn't make sense at scale for sure.

Which is interesting, they support tape just fine, but anything on disk is Io intensive for verify's and such.

I hope improvements come for verify and such, my verify time is blocking backups right now. We are on a subscription plan and I haven't seen this patch hit the sub version yet.
 
Ah, it looks like I misremembered.
I've found the commit I was thinking about: https://git.proxmox.com/?p=proxmox-...it;h=6eade0ebb76acfa262069575d7d1eb122f8fc2e2
But that is about a backup restores, not verifys.

I hope improvements come for verify and such
Overall, I dont see any magically performance "fix" coming, because the Verify Operation is simply disk intensive - its reading the whole 4MB chunk.
I don't see much room for optimizations there, as this already is a quite simple operation.
If the Verify really takes too long, cache drives may help with that.
On the other hand, HDDs and Tapes are very good at those streaming read workloads, usually.

Why the Verify blocks backup jobs is unclear to me, as verify should be a readonly operation anyway.
Whats preventing me from creating new backups in the background then, it should not interfere?
 
Last edited:
When using ZFS, the verify-process IMHO is very overrated. It protects you from bitrot, which ZFS already has protection for.

Our largest PBS is currently about 90TB, filled with about 73TB of data. Verify as we speak reads about 2GB per second, and we try to limit the amount of verifies.

One thing that improves performance greatly, is adding NVME 'special' devices for ZFS. That mirror-set handles thousands of IOPS, which don't go to your spinning disks.

Also read https://forum.proxmox.com/threads/zfs-pool-io-when-idle.89911/#post-393317 and https://forum.proxmox.com/threads/is-verify-task-needed-with-zfs-backed-datastore.84081/#post-369460. A note on the latter, @t.lamprecht does not advice to disable verification, but he opens a window to make educated decisions about the use of verification.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!