[SOLVED] Force delete of "Pending removals"

lazynooblet

New Member
Jan 23, 2021
14
5
3
43
I'm stuck in a situation where backups are no longer functioning because the storage is at 100%.

Pruning malfunctioned, and disk usage began to grow. I was alerted when the free space was at 10%. (The prune job "Next Run" was in the past, telling it to "Run now" resolved this, but the damage was done).

After performing a manual prune and garbage collection, 40% of the existing data was placed in "Pending removals". However my backups are failing.

How do I delete this data? Its been over 24 hours since this data was backed up. The data I want to delete is over 6 days worth of normally pruned backups.

My backups are failing, if I can't fix this I'll have to blow the whole thing away and start again.

Code:
Pending removals: 262.697 GiB (in 200114 chunks)

1657957564270.png
ha, estimated full "Never"
 

lazynooblet

New Member
Jan 23, 2021
14
5
3
43
So the primary cause of this was the server was started with a date in the future. Having read other forum posts on the logic used by PBS I surmised that the cause is an access time in the future.

I ran the following to set access time to match last modified
Code:
find /data/pbs |while read i; do echo $i; d=$(stat --format %y ${i}); touch -d "${d}" ${i}; done

Ran a garbage collection and voila!
Code:
2022-07-16T09:49:13+01:00: Removed garbage: 251.948 GiB

Back to normal
1657961727778.png
 

eXtremeSHOk

Active Member
Mar 15, 2016
33
14
28
39
^^ has the potential to kill all the backups, as some chunks are now marked "old", when they are actually needed and should have current access times
 

lazynooblet

New Member
Jan 23, 2021
14
5
3
43
I don't understand the logic linking of "old" and "needed". I'd of thought GC would remove unused chunks, so if a chunk is in use by an existing backup it wouldn't get deleted.

By your comment, would that mean if I leave PBS off for a few days, it'll delete everything during the next GC? I know little of the inner workings of PBS, can you explain?
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,924
1,527
164
I don't understand the logic linking of "old" and "needed". I'd of thought GC would remove unused chunks, so if a chunk is in use by an existing backup it wouldn't get deleted.

By your comment, would that mean if I leave PBS off for a few days, it'll delete everything during the next GC? I know little of the inner workings of PBS, can you explain?
the atime is there to protect chunks that have been added by backups running concurrently to the GC task. the GC task will only treat chunks as "used" that are referenced by indices in the datastore. a backup snapshot currently being created doesn't have any valid indices yet (those are finalized when the backup is finished), so to protect the chunks newly added by such snapshots GC will only consider chunks eligible for removal that are older than 24h or the oldest backup worker running at the start of the GC (whichever is earlier). the 24h are because depending on the local setup, atime might not be updated again if the last update (GC/..) was within the last 24h.

so no, powering off a PBS instance, leaving it off for a bit and then booting it again won't cause GC to remove everything ;)
 

eXtremeSHOk

Active Member
Mar 15, 2016
33
14
28
39
We run GC and prune tasks every 30mins, backups from a cluster are done every 15mins.

But we land up with pending removals of around 10TB per 24hours.

atime is enabled (relatime disabled), Is there any way to force GC to remove after 4 or 6 or 12hours and not wait 24hours and 5mins ?
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,924
1,527
164
that seems like a weird setup to be honest - are you doing really frequent backups that you throw away almost immediately? why?
 

eXtremeSHOk

Active Member
Mar 15, 2016
33
14
28
39
For us using PBS offers us an option to make fast, low impact backups with high frequency on a multitude of datasets.

With Rapidly changing data on web, database, email, file servers, why would we want a daily backup only which will be many many hours old, when we can have X 15min backups and X hourly backups?

The issue is, with atime=on, the old pending deletes are only removed after 24 hours. They are marked for removal, so there should be a way for them to be removed.

The 24 hour 5min delay was a work around to enable relatime=on, but this has basically broken the usage of true atime.

Now assume 10 VM’s which have 100gb of dirty/changed data every hour, even with deduplication, its many TB’s of old and marked garbage data in a 24 hour period. What is the purpose of keeping this if you only need the last 4 backups to remain present.

The innovation of PBS is the ability to make high frequency backups. To make this happen NVMe is required but this comes with limitation in terms of available space.

What would be the approach to reduce the window of marked data sitting idle on the disks?
 
Last edited:

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,924
1,527
164
yeah, PBS allows frequent backups with low overhead, but it's not really written with high churn and immediate deletion after backup in mind. it's documented that chunks (once created) will stay around for at least 24h, even if no longer used. we could of course offer the foot gun of reducing this threshold ("pinky promise atime works properly" ;)) - but the risk that somebody enables that without understanding and destroys their backups kind of outweighs the rather exotic "backup deletion within <24h of backup creation" use case you have - it kind of smells like abusing backups as short-term snapshotting mechanism (the two are superficially similar, but just like snapshots are not backups the inverse also holds true ;))
 

eXtremeSHOk

Active Member
Mar 15, 2016
33
14
28
39
Unfortunately Snapshots are not remote and they prevent migrations between nodes. (local storage)

I do think a good common ground would be to have an option to set a lower value than 24hours. (6hours, 12hours, etc)

If atime is broken, surely relatime will also be broken ? From my estimation In a 3hour period there would be atleast 12 opportunities to set the atime value correctly.
 

Dunuin

Famous Member
Jun 30, 2020
7,312
1,767
149
Germany
but the risk that somebody enables that without understanding and destroys their backups kind of outweighs the rather exotic "backup deletion within <24h of backup creation" use case you have
Couldn't that be a hidden flag in the datastore.cfg with no actual checkbox in the GUI to enable it? If something is only usable through CLI people usually don't use it, except they really need it and read the documentation on how to actually enable it. And for safety, atleast for ZFS backed datastores it wouldn't be that hard to additionally check if relatime is disabled (for example a zfs list -o relatime,atime,mountpoint and then grep for the datastores path and see if atime or relatime is set for that line).
 
Last edited:
  • Like
Reactions: eXtremeSHOk

eXtremeSHOk

Active Member
Mar 15, 2016
33
14
28
39
hidden flag/setting in the datastore.cfg is the ideal solution.

For perspective, one of the pbs servers:
Code:
2022-08-18T13:22:42+02:00: Removed garbage: 34.908 GiB
2022-08-18T13:22:42+02:00: Removed chunks: 27588

2022-08-18T13:22:42+02:00: Pending removals: 15.045 TiB (in 8600742 chunks)

2022-08-18T13:22:42+02:00: Original data usage: 27.053 TiB
2022-08-18T13:22:42+02:00: On-Disk usage: 619.959 GiB (2.24%)
2022-08-18T13:22:42+02:00: On-Disk chunks: 400623
2022-08-18T13:22:42+02:00: Deduplication factor: 44.68
2022-08-18T13:22:42+02:00: Average chunk size: 1.585 MiB


a VM backup
Code:
INFO: resuming VM again
INFO: virtio0: dirty-bitmap status: OK (71.0 GiB of 1.2 TiB dirty)

INFO: using fast incremental mode (dirty-bitmap), 73.4 GiB dirty of 1.2 TiB total
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!