Plans to add Backup Storage Usage limit instead of retention numbers?

Tmanok

Well-Known Member
Hi Everyone,

Recently a thought crossed my mind while tuning a Proxmox Backup Server at one site and a Proxmox Virtual Environment server at another site: sometimes it is difficult to gauge the amount of disk space required while configuring backup retention. Some backup solutions, such as Apple's TimeMachine utilize all available storage and then prune only when necessary, other utilities allow you to set a "minimum free" amount (in GiB).

Is there a plan to ever allow for an alternative backup retention using disk space instead? Might be nice for those who want to "set and forget" rather than tuning minimum retention. Especially on very big and very small backup storages (e.g. 20TiB when VM disk usage is only say 1TiB or less). Would also automate retention on growing or shrinking virtual environments.

Thanks,


Tmanok
 
especially for pbs this is not really possible since it is not known how much free space you'd gain by removing a specific snapshot (initial it would be next to nothing at all, this would only potentially free space on the next garbage collection)
since the chunks are deduplicated across the whole datastore
 
especially for pbs this is not really possible since it is not known how much free space you'd gain by removing a specific snapshot (initial it would be next to nothing at all, this would only potentially free space on the next garbage collection)
since the chunks are deduplicated across the whole datastore
Hi Dominik,

Ok, so what you are saying is that enforcing such a limit would be impractical because removing the oldest data before trying to backup might not provide enough space, so then you would have to continuously remove backups until there was enough space for the latest copy? Or is it because of the complexity involved with the global deduplication of the datastore?

Thanks Dominik,


Tmanok
 
Also keep in mind that you can't instantly delete stuff. First you need to run a prune task to mark data for deletion. Then you need to run a GC task so the marked data gets actually deleted and data will only be deleted if the prune task was run atleast 24 hours earlier. So you can't just remove stuff right before the backup to make space for new data.
 
Last edited:
Also keep in mind that you can instantly delete stuff. First you need to run a prune task to mark data for deletion. Then you need to run a GC task so the marked data gets actually deleted and data will only be deleted if the prune task was run atleast 24 hours earlier. So you can't just remove stuff right before the backup to make space for new data.
Well...
  1. You can make it occur within any period and set pruning + gc as a task prior to your backup.
  2. If they would implement the feature and a user configured a capacity limit instead of # of backups, patching the backup system to force a prune+gc schedule to just before the backup would be trivial.
Cheers,

Tmanok
 
Well...
  1. You can make it occur within any period and set pruning + gc as a task prior to your backup.
I already tried it. If you run GC after prune and don't wait 24h, nothing will be found that may be deleted. Staff also explained once why this 24h interval is needed in between, but can't remember the reason.
 
I already tried it. If you run GC after prune and don't wait 24h, nothing will be found that may be deleted. Staff also explained once why this 24h interval is needed in between, but can't remember the reason.
Well shoot, I honestly did not know that. Thanks for letting me know, perhaps there is something else at work here...

Thanks Dunuin,

Tmanok
 
Staff also explained once why this 24h interval is needed in between, but can't remember the reason.
Behavior of relatime mount option which only guarantees transient updates if the old timestamp is >24h, we're planning to re-evaluate if we can do away with this, perhaps due to some recent uncovered behavior details in Linux/VFS, but that naturally needs careful review.

Ok, so what you are saying is that enforcing such a limit would be impractical because removing the oldest data before trying to backup might not provide enough space, so then you would have to continuously remove backups until there was enough space for the latest copy?
Have in mind that deleting the oldest data may really not be a safe thing to do at all, e.g., some country demand archival of some information for, e.g., 7 years, prune retention can cope with that efficiently, just deleting trailing, old backups can't.

Also, due to the CAS the oldest and less old backups may share 99.9% of their data, so on a backup that references 1 TiB of data only 1 GiB would be freed up on deleting the old snapshot and then running GC.

So in general I'd recommend starting out with a relatively generous retention time and then monitor space usage grow.
The dashboard also gives you an "estimated full date" hint, which should help to determine if the chosen values for keep retention are scaled OK, at least after your workload ran for a while to make that estimation more accurate.
 
Have in mind that deleting the oldest data may really not be a safe thing to do at all, e.g., some country demand archival of some information for, e.g., 7 years, prune retention can cope with that efficiently, just deleting trailing, old backups can't.
Of course, including my home country for some legal information. That's why you would keep the original retention system, probably you could force it to be "either or" rather than both or removing the old methodology.

Also, due to the CAS the oldest and less old backups may share 99.9% of their data, so on a backup that references 1 TiB of data only 1 GiB would be freed up on deleting the old snapshot and then running GC.
Perhaps I'm missing the issue. I could see diffs having dependencies, but I'm not sure why there would always need to be an identical amount of data to recover/delete. Why could the Prune+GC system not retry until enough space was made for the next backup to occur? It would take more time, but you could hypothetically take the best advantage of the storage dataset.

Thanks,


Tmanok
 
Perhaps I'm missing the issue. I could see diffs having dependencies, but I'm not sure why there would always need to be an identical amount of data to recover/delete. Why could the Prune+GC system not retry until enough space was made for the next backup to occur? It would take more time, but you could hypothetically take the best advantage of the storage dataset.
Yeah, I don't see doing PBS a deletion-loop over all snapshots anytime soon, that's IMO just crying for data loss to avoid a simple one time keep retention configuration.

Backups from the same series will always share a good amount of chunks, they're made from the same source and in practice the base system doesn't change that frequently and often also not by much. With a deduplicated system deleting on snapshot just won't to much in the general case (can certainly be constructed artificially).

Besides that, you do not know how much new space is required in advance, that's just impossible, so the backup itself would need to trigger that continuously plus waiting on for GC to actually clean that up, delaying the backup for hours or more (and if it fails at the end you'd potentially be left without any backup at all anymore), but we explicitly made the PBS permissions system such that one configures a user, or preferably API token, that is only allowed to create new backups, not delete existing ones, to avoid that a host that got taken over so this cannot really work in the setups we recommend.

If you do not care for all old backups in your case just configure a low keep-last value and be done, if that integrated way isn't enough for you, you can already script this by checking usage periodically and delete, as said future-prediction is hard, so this would be also just a heuristic, but about that what you request from the integrated way.

IMO a safer feature, send out a reminder mail to the admin if space and or estimated-time-left gets low, potentially allowing the admin to override the notification threshold.
 
  • Like
Reactions: Dunuin
Very good answer, thank you sir. For "future prediction", PVE already does a good job with this when it snapshots guest disks, it creates the dirty bitmap and therefore knows the required space, but as you said a deletion loop sounds like someone asking for trouble

My thoughts were from a time before diffs were so integrated (think Mac OSX TimeMachine) but I can see the difficulties and risks involved clearly now, cheers.

Tmanok
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!