proxmox backup scheduling

Sep 26, 2023
62
1
8
Hello all.


I have backups scheduled on my server, and also am using PBS to pull 'said' backups from the corp environment to the dr side. Pretty std stuff and both the main and remote sides are using the latest versions of software.

Regarding the 'local/corp' backups. Whenever I have 'keep last' set to a specific number - is the system keeping the last # of backups, or the number-of-days, for a scheduled day. For instance, usually and previously in other software solutions whenever I say keep "last 14", then I would expect to keep the last 14 days of backups. However in the PBR solutions there is also a 'daily' options that can be enabled. If I have the 'daily' option set, say to 10 - does that mean that I have the last 14 'instances' of the backup as well as 10 days of backups or is there some type of 'overlap' in the backups? From that I mean that I have 14 days of backups, but only 10 days so I temporarily 14 backups but each day whenever the new backup is day (day 15) that the oldest backup gets pruned?

In the first situation, let's say I have a 100Gb backup - done on day 1 and each day there are 50mgs of incremental changes. I presume that backups are a 'forever chain' - meaning that the first backup is actually a full backup, and each day are only incremental backups. Back to the example - Day 1 creates 100Gb of backups. With the 14 day 'keep last' setting then I would expect to have approx (not taking into effect compression, etc) 1650Gb of data or 100Gb and 13*50 (650mb) of data. Is that correct? I ask because in the 2nd situation with 'keep last 14 and daily' being set then the amt of data storage for that backup is greater..or approx 250mb more for that single backup.

Looking at the PBR side - It seems that it's a little better in that 'almost' whatever I select to pull (keep last,daily,monthly,etc) then it's only going to pull whatever it can accomplish to keep that amt of data. Correct?

What happens in the 'corp' side with a prune job then if I set it to run daily, and only keep last 5? I set the backups to 'keep last' at 14, if the prune job is set to 'keep last' 5? Do I have backups for 14 days, but daily the prune job is only set to keep last 5..so essentially I only have a good 'local' backup of 5 days?

1 last question if I might regarding backups. Is there a way to determine the amt of data, collectively, that is being kept? Although I can expand each replicated backup and see the data there (for instance 20 copies of a 50 gb vm) I'm not seeing any type of total 'size' on my remote data side for each server. I'd also thought that if I had, in the previous example of 'keep last 14' that I'd should see a similar amt of data storage on the remote side - but they are different..by alot.
 
Last edited:
I have backups scheduled on my server, and also am using PBS to pull 'said' backups from the corp environment to the dr side. Pretty std stuff and both the main and remote sides are using the latest versions of software.

Regarding the 'local/corp' backups. Whenever I have 'keep last' set to a specific number - is the system keeping the last # of backups, or the number-of-days, for a scheduled day. For instance, usually and previously in other software solutions whenever I say keep "last 14", then I would expect to keep the last 14 days of backups. However in the PBR solutions there is also a 'daily' options that can be enabled. If I have the 'daily' option set, say to 10 - does that mean that I have the last 14 'instances' of the backup as well as 10 days of backups or is there some type of 'overlap' in the backups? From that I mean that I have 14 days of backups, but only 10 days so I temporarily 14 backups but each day whenever the new backup is day (day 15) that the oldest backup gets pruned?

https://pbs.proxmox.com/docs/prune-simulator/index.html ;)

In the first situation, let's say I have a 100Gb backup - done on day 1 and each day there are 50mgs of incremental changes. I presume that backups are a 'forever chain' - meaning that the first backup is actually a full backup, and each day are only incremental backups. Back to the example - Day 1 creates 100Gb of backups. With the 14 day 'keep last' setting then I would expect to have approx (not taking into effect compression, etc) 1650Gb of data or 100Gb and 13*50 (650mb) of data. Is that correct? I ask because in the 2nd situation with 'keep last 14 and daily' being set then the amt of data storage for that backup is greater..or approx 250mb more for that single backup.
in PBS each snapshot is independent, there are no full or incremental snapshots. there are some optimizations when creating the backup if a previous backup snapshot for that group exists (these are referred to as "incremental backups"), but server-side each snapshot references all the chunks needed to restore its contents, there are no "diffs" to previous snapshots that are stored.
Looking at the PBR side - It seems that it's a little better in that 'almost' whatever I select to pull (keep last,daily,monthly,etc) then it's only going to pull whatever it can accomplish to keep that amt of data. Correct?

What happens in the 'corp' side with a prune job then if I set it to run daily, and only keep last 5? I set the backups to 'keep last' at 14, if the prune job is set to 'keep last' 5? Do I have backups for 14 days, but daily the prune job is only set to keep last 5..so essentially I only have a good 'local' backup of 5 days?

1 last question if I might regarding backups. Is there a way to determine the amt of data, collectively, that is being kept? Although I can expand each replicated backup and see the data there (for instance 20 copies of a 50 gb vm) I'm not seeing any type of total 'size' on my remote data side for each server. I'd also thought that if I had, in the previous example of 'keep last 14' that I'd should see a similar amt of data storage on the remote side - but they are different..by alot.

the pruning is independent on both sides - you can prune more aggressively on the pull target (if space is limited for example), or on the source (but then you need to ensure that you sync often enough to not "lose" snapshots before they could be synced).

space usage is tricky - since the chunks are deduplicated across the whole datastore, it's not easy to say "this snapshot/group/.. uses X GB"", since hopefully a lot of chunks are shared at least between different snapshots of the same group, and potentially even across groups and namespaces...
 
https://pbs.proxmox.com/docs/prune-simulator/index.html ;)


in PBS each snapshot is independent, there are no full or incremental snapshots. there are some optimizations when creating the backup if a previous backup snapshot for that group exists (these are referred to as "incremental backups"), but server-side each snapshot references all the chunks needed to restore its contents, there are no "diffs" to previous snapshots that are stored.


the pruning is independent on both sides - you can prune more aggressively on the pull target (if space is limited for example), or on the source (but then you need to ensure that you sync often enough to not "lose" snapshots before they could be synced).

space usage is tricky - since the chunks are deduplicated across the whole datastore, it's not easy to say "this snapshot/group/.. uses X GB"", since hopefully a lot of chunks are shared at least between different snapshots of the same group, and potentially even across groups and namespaces...
Agreed. What is also confusing given the 'deduplication chunk' process that goes on is that whenever I look at, say on the main side I see that each backup of a 60GB server with 20 copies..has 20 copies of that same 60GB on the host side. the remote side is showing also that each backup is 60Gb in size..for all the copies that I have over there. I haven't found a way to determine, other than keeping log of each daily backup for each vm to determine how much data is actually changing or being kept either. BTW I have scheduled a daily sync, verify and every other day verify of my jobs. Not to mention the reporting of the time each task takes to complete isn't very clear either. The reporting should state - at the top of the start job, the date and time..then at the end - the date/time it completed ALONG with how much and what was done in each process. For example - job started, today @0100 - completed the task @0500 (taking 4 hours to complete) and job total of sync/backup/replicate/etc - was XXX gb.
 
the GC task will collect deduplication stats for the whole datastore. and the snapshot manifest will contain the information how big the delta to the previous snapshot was at the time of snapshot creation (but of course, the actual unique data referenced by a snapshot can be higher or lower).
 
the GC task will collect deduplication stats for the whole datastore. and the snapshot manifest will contain the information how big the delta to the previous snapshot was at the time of snapshot creation (but of course, the actual unique data referenced by a snapshot can be higher or lower).
I do have a PBR a the corp side.
Here's something strange: I have a prune job which has the following settings on it Keep last 5, daily 8, 4 weeks, and 12 months. With these settings, for the last 60 days I should have 19 backups - correct? I'm showing 20 backups (all seem to be full as their data size is the same but the dates are wrong. Can you explain this? Whenever I run my prune job it completes in a couple of seconds (not really doing anything) and the data changes. I know I can manually prune but the daily schedule prune job should be keeping things 'in check'.

1722944297308.png
 
Last edited:
please see the admin guide and the prune simulator to understand how the pruning rules work! Keep last 5, daily 8, 4 weeks, and 12 months will potentially keep up to 29 backups.
 
please see the admin guide and the prune simulator to understand how the pruning rules work! Keep last 5, daily 8, 4 weeks, and 12 months will potentially keep up to 29 backups.
I did review and utilized the simulator - that's why whenever i posted the pics the data doesn't reflect the schedule of pruning and the reasoning for my question. why do i have only 20 copies of the data, when i should have approx 29 backups? Also doesn't explain why whenever I manually specify those settings I don't see a change with my data.
 
the backups seem to follow your given pruning settings (5 last + 8 daily, then the interval increases), so either you never had enough backups (12 months but your first backup being from March would indicate this ;)) and/or the pruning settings were changed at some point..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!