keep last X vs hourly vs xxx

Sep 26, 2023
70
5
8
Hello.
I'm trying to understand a sync process that I have and the amt of the data storage.

Here's my environment.
production - 2 servers, pve and pve/pbr
dr site - single pve/pbr

The production server does backups every 2 hours and a sync job replicates it over to the pve/pbr server on the same schedule. . This thinking process is that i have a 2 hour window if something goes wrong on the production box and i need to restore something. wouldn't the better option be to replicate every hour with a 2 hour backup schedule to shorten the window?
The retention level on the production side (prune) is configured to last-5, daily 8, weekly/monthly-12, and yearly 2 with weekly garbage collection.

The dr side does a pull every 2 hours from what is in/on the production side.
The dr side prune job is set to last-8, daily-30, weekly/monthly-12, and 5 years.

I have 17 servers containing approx. 2.1TB of space. My current storage, of backups, on the Dr side is approx. 8TB and climbing.
Manually doing the math based off the # of servers * snapshots I would end up with approx. 1244 backup/snapshots at the DR side. The same math means that every year I would have 2.1TB for my yearly backups and a 'floating storage' space of 11Tb. Does this sound correct? I didn't think the data sprawl would be this big, especially the amt of 'floating' data required each year until the previous month/weekly/etc drop off. This seems to indicate that I need at least 25TB of space on the Dr side.

What am I missing? Also, even though i have over 1 year of data (backup/snapshots) on my system and my prune/GC jobs have run - I'm still showing more files than I should. for instance: 1 server has 18 months of backups, and it should only 12 + 1 (yearly). The jobs complete w/o issues but my data never get's reduced.
 
  • Like
Reactions: Johannes S
Did you already tried the Prune simulator at https://pbs.proxmox.com/docs/prune-simulator/ ?
I found it really helpful for understandibg the omplications of specific prune settings.
can you please post a log of yozr prune and gc Jobs? Maybe they give an insight what's going wrong.
Which storage media are you using and how do you connect it to the PBS?
 
  • Like
Reactions: UdoB
@markf1301 The amount of data used on PBS depends heavily on 1) how much is duplicated, since PBS deduplicates, and 2) how much data changes in between backups. A VM that is off will take basically no extra space for each backup. A VM that changes all of its data every day will be a full backup every day.

> The dr side prune job is set to last-8, daily-30, weekly/monthly-12, and 5 years.

So it sounds like your issue is that PBS should be pruning out months 13-18, and isn't? (though, the last one in 2023 it should keep also)
 
  • Like
Reactions: Johannes S
here's the latest prune/gc jobs. i removed some of the prune info but you can see the start, schedule, and completion.
the Gc says that it still has it has 1.6Tb to be removed but i haven't seen that data actually be removed from the system. after running both of these jobs today, i still have approx 7.7Tb of datastore usage. when would the actually 1.6Tb get removed - if not after the job completes? Yes, i did a refresh of the data.

correct SteveITs - I shouldn't have more than 63 backup/snapshots on the system for all my servers, + 1 additional set of 17 for my servers on a yearly basis.

My # of snapshots did drop from 687 to 513, but not my 'free space' on the datastore.
 

Attachments

The prune marks the chunks delete-able, GC removes them: https://pbs.proxmox.com/docs/backup-client.html#garbage-collection

In your prune log the "protected" backups will stay forever. 209 and 603 don't have older backups than the two protected ones? Therefore nothing to prune there...?
True. that said, from the last report on my GC it shows the following:
2025-05-29T13:07:02-04:00: Removed chunks: 3744
2025-05-29T13:07:02-04:00: Pending removals: 1.395 TiB (in 1287856 chunks)
2025-05-29T13:07:02-04:00: Original data usage: 61.633 TiB
2025-05-29T13:07:02-04:00: On-Disk usage: 5.93 TiB (9.62%)
2025-05-29T13:07:02-04:00: On-Disk chunks: 4261065
2025-05-29T13:07:02-04:00: Deduplication factor: 10.39
2025-05-29T13:07:02-04:00: Average chunk size: 1.459 MiB
2025-05-29T13:07:02-04:00: TASK OK

When does the 'pending removals' supposed to happen, if not after the job - or several hours later?
 
Chunks are removed if they are not referenced in a backup snapshot for more than 24 hours and five minutes. If I recall correct there is now an option to change this but I'm not sure what caveats changing it might have.
 
here's the latest prune/gc jobs. i removed some of the prune info but you can see the start, schedule, and completion.

I actually wanted to see whether the prune job even removed anything, you actually removed the part I was most interested in ;)

From the gc log it seems that it removed 1.5 GiB and 1.395 TiB of unused chunks will be removed at a later time since their snapshots were pruned less than the access time cutoff (from your log):
2025-05-29T11:54:57-04:00: starting garbage collection on store corpbackups
2025-05-29T11:54:58-04:00: Access time update check successful, proceeding with GC.
2025-05-29T11:54:58-04:00: Using access time cutoff 1d 5m, minimum access time is 2025-05-28T15:49:57Z
2025-05-29T11:54:58-04:00: Start GC phase1 (mark used chunks)
2025-05-29T12:01:09-04:00: marked 1% (6 of 516 index files)
...(removed lines)...
2025-05-29T13:00:57-04:00: marked 99% (511 of 516 index files)
2025-05-29T13:01:21-04:00: marked 100% (516 of 516 index files)
2025-05-29T13:01:21-04:00: Start GC phase2 (sweep unused chunks)
2025-05-29T13:01:40-04:00: processed 1% (55740 chunks)
2025-05-29T13:01:54-04:00: processed 2% (111054 chunks)
...(removed lines)...
2025-05-29T13:06:53-04:00: processed 97% (5385889 chunks)
2025-05-29T13:06:56-04:00: processed 98% (5441458 chunks)
2025-05-29T13:06:59-04:00: processed 99% (5497213 chunks)
2025-05-29T13:07:02-04:00: Removed garbage: 5.91 GiB
2025-05-29T13:07:02-04:00: Removed chunks: 3744
2025-05-29T13:07:02-04:00: Pending removals: 1.395 TiB (in 1287856 chunks)
2025-05-29T13:07:02-04:00: Original data usage: 61.633 TiB
2025-05-29T13:07:02-04:00: On-Disk usage: 5.93 TiB (9.62%)
2025-05-29T13:07:02-04:00: On-Disk chunks: 4261065
2025-05-29T13:07:02-04:00: Deduplication factor: 10.39
2025-05-29T13:07:02-04:00: Average chunk size: 1.459 MiB
2025-05-29T13:07:02-04:00: TASK OK

So for me this looks like everythings works as designed ;) I would expect that a garbage collection job launched on 2025-30 after 1 PM will remove most if not all of the pending removals. Please report how it works out