[SOLVED] Force delete of "Pending removals"

The 24 hours 5 minutes is because of relatime, as with that the atime is only updated once every 24 hours and PBS uses the atime do decide what chunks are still in use and which not.
^^we are all aware of that fact. This thread is about having a workable and reliable solution to clearing the garbage earlier. Large users (20+proxmox nodes, 100+TB of backups, quickly run into limitations with the current 25hour GC)
 
Last edited:
^^we are all aware of that fact. This thread is about having a workable and reliable solution to clearing the garbage earlier. Large users (20+proxmox nodes, 100+TB of backups, quickly run into limitations with the currently 25hour GC)
Yeah, but the problem I see is this. Let us say there would be an option to lower the interval in the webUI below 24 hours. I would bet not all people know this. Even if only 1% of the people don't know it and use relatime, these 1% would lose their backups, as all backup snapshots would be wiped by the GC without ever pruning them first. Then there would be a lot of angry customers and maybe even serious damage.
What I have seen so far, the Proxmox staff doesn't like to implement options that might cause data loss when used by inexperienced users.

So such an option would require to check if relatime is disabled first (option grayed out while relatime enabled or something like that) and this is probably not that easy as PBS isn't limited to specific filesystems.
 
Last edited:
The suggestion was a hidden / expert option in the cfg, or a command on the command line.

im taking pure nvme backup storage, where atime has a negligible impact. Having 20TB on nvme sitting idle and waiting for 25hours to be removed is expensive.

It would be a simple check to see if atime enabled on the pool then allow 1 hour gc cleanup ,and if relatime enabled allow 24+1 hour gc cleanup
 
Last edited:
the issue is you need to know what the mount option was in the past, not at the moment when you start GC ;) we';; see whether it's possible to improve.
 
first GC run with atime enabled option, could wait the full 25hours for gc. The subsequent runs would be the 1hour gc
 
first GC run with atime enabled option, could wait the full 25hours for gc. The subsequent runs would be the 1hour gc
Then you could still run into problems. You would need to permanently monitor the storage for the last 24 hours and log if atime was enabled or not.

And keep in mind that not all people run stock datastores. Here PBS uses a NFS share and that ia ecported on a ZFS pool with relatime enabled. Not sure if PBS then could check if relatime is enabled, as it can't run a "zfs get relatime pool/datastore".

Maybe PBS could write to a file in the datastore every minute, then check its atime and set a flag in case the atime is bigger than 5 minutes? And that flag can only be unset after 24 hour and with the flag present GC will only delete chunks older 24 hours?
 
Last edited:
atime doesnt become set and uset at random, its an admin action.

So if you check if atime is enabled and wait 25hours, and then atime is still enabled do the 1hour GC

Every hour when GC is run, it can recheck if atime is enabled or not.

If atime is no longer enabled, revert back to gc 25hours.
 
Last edited:
Currently how does proxmox work around the usecase of an admin randomly enabling/disabling atime and relatime ? Or setting all the file times to -1200 days ? ;)
 
Last edited:
Currently how does proxmox work around the usecase of an admin randomly enabling/disabling atime and relatime ? Our setting all the file times to -1200 days ?
we err on the side of caution and always use a 24h+5min cutoff (or an even bigger one, if backup tasks started before that are still running). if you randomly modify the contents of a chunk store all bets are off and you are responsible for breakage ;)
 
Same issue here.

I was running an invalid backup job, as i missed to exclude a additional disk from the vm backup. The pbs datastore was filled up 100% and now i cannot backup for 24h as the disk is full and i must wait the 24h 5min to get rid of the failed backup data.
 
Unfortunately until now, proxmox has not allowed those using "atime" to purge the garbage quicker than those using "relatime"
 
you can put the datastore into read-only maintenance mode, wait until that is active (no more writes by PBS or clients possible), then reset the atime of all chunks to > 24h ago (e.g. with a find .. -exec .. command), disable the maintenance mode again and then run the GC.
 
hidden flag/setting in the datastore.cfg is the ideal solution.

For perspective, one of the pbs servers:
Code:
2022-08-18T13:22:42+02:00: Removed garbage: 34.908 GiB
2022-08-18T13:22:42+02:00: Removed chunks: 27588

2022-08-18T13:22:42+02:00: Pending removals: 15.045 TiB (in 8600742 chunks)

2022-08-18T13:22:42+02:00: Original data usage: 27.053 TiB
2022-08-18T13:22:42+02:00: On-Disk usage: 619.959 GiB (2.24%)
2022-08-18T13:22:42+02:00: On-Disk chunks: 400623
2022-08-18T13:22:42+02:00: Deduplication factor: 44.68
2022-08-18T13:22:42+02:00: Average chunk size: 1.585 MiB


a VM backup
Code:
INFO: resuming VM again
INFO: virtio0: dirty-bitmap status: OK (71.0 GiB of 1.2 TiB dirty)

INFO: using fast incremental mode (dirty-bitmap), 73.4 GiB dirty of 1.2 TiB total
Hi Extremeshok, what's means hidden flag/setting in the datastore.cfg?
i not understand.
Thank you.
 
While we are facing the same issue and need to wait 24 hours for GC to clean up i have a quick question.

Will PBS allow a restore ?
I hope so as it's read only and this can turn into a nightmare scenario if we cannot retrieve data from it when its full

Our datastore is on a separate zpool drive so the system drive is not full
 
it depends on how your file system implements locks - if they don't work with an (almost) full file system, then restores also won't work since PBS needs to lock the datastore/snapshots/.. . there's always the low-level proxmox-backup-debug tool that directly operates on the files that can be used for disaster recovery though, that doesn't use any locks.
 
it depends on how your file system implements locks - if they don't work with an (almost) full file system, then restores also won't work since PBS needs to lock the datastore/snapshots/.. . there's always the low-level proxmox-backup-debug tool that directly operates on the files that can be used for disaster recovery though, that doesn't use any locks.

we were lucky that i have found a 15gb test benchmark file on this Zpool Array...
is it normal that the last garbage collection that worked the previous day prior to array been full that it didnt free up his '' pending removal '' the next day ? ( while been full )

30 hours later the array was still 100% full...! ( dead end )

we are responsible for the 100% full, but you need to manage to have a reserve and allow your Customers like us to find a solution in minutes. not hours of reading on forum for a possible work around , with incomplete CLI commands

The logic of anyone to fix the Datastore would have been to delete some VM , but the 24 hours waiting to free that space seem to do not work at all when its 100% full.

it took us 30 hours to manage and find a solution, in this 30 hours , we are lucky that no VM failed during that time.
i think you understimate how critical this '' limitation / issue '' is
enforce limit in your system to ALWAYS reserve enough space to been able to manage this ... its a must.
and allow to force '' Force delete pending removal ''

thx
 
Last edited:
this is sysadmin 101 - if you have a mountpoint that is at risk of running full by accident, either set a quota or have some way to allocate more storage from the underlying layer when needed. how you do that is filesystem/storage dependent, and PBS doesn't care about that part.

GC can only free things if it can run (it sounds like it couldn't in your case?) - the pending part is pending because it was not possible to determine for sure whether it's safe to remove, and re-checking that requires another GC run. runnig it does require a little bit of space - this is the case for almost any piece of software - allocating things like locks or tmpfiles is very very common (and for basic system operations, this is why the default settings will reserve some headspace for root-only usage when formatting a file system..).

if you are using ZFS, there are two ways to solve this:
- set quotas on all your dataset(s) in a way that ensures the pool can never run out of space (i.e., distribute all the available space up front)
- create a "reserve" dataset and set a reservation on it, but never write to it (you can then reduce the reservation to get "free" space for the rest of the pool)

you can combine both (e.g., set quotas on certain datasets that are prone to get out of hand, and also have a reserve dataset in case of an accident).
 
  • Like
Reactions: Neobin
i think you understimate how critical this '' limitation / issue '' is
enforce limit in your system to ALWAYS reserve enough space to been able to manage this ... its a must.
and allow to force '' Force delete pending removal ''
When using ZFS you can easily set a quota for the dataset that stores your datastore or for the entire pool. I usually set that to ~90% for all pools I create (filling up the pool too much will slow it down and speeds up fragmentation of HDDs #...and you can't defrag a ZFS pool...).
Could be set like this: zfs set quota=100T yourPool/YourDataset
 
Last edited:
this is sysadmin 101 - if you have a mountpoint that is at risk of running full by accident, either set a quota or have some way to allocate more storage from the underlying layer when needed. how you do that is filesystem/storage dependent, and PBS doesn't care about that part.

GC can only free things if it can run (it sounds like it couldn't in your case?) - the pending part is pending because it was not possible to determine for sure whether it's safe to remove, and re-checking that requires another GC run. runnig it does require a little bit of space - this is the case for almost any piece of software - allocating things like locks or tmpfiles is very very common (and for basic system operations, this is why the default settings will reserve some headspace for root-only usage when formatting a file system..).

if you are using ZFS, there are two ways to solve this:
- set quotas on all your dataset(s) in a way that ensures the pool can never run out of space (i.e., distribute all the available space up front)
- create a "reserve" dataset and set a reservation on it, but never write to it (you can then reduce the reservation to get "free" space for the rest of the pool)

you can combine both (e.g., set quotas on certain datasets that are prone to get out of hand, and also have a reserve dataset in case of an accident).
hi the 100% full was not on the ZFS , we had a 250gb spair on the ZFS pool, but PBS was retutning a 100% full status.
but for some reason even if we where lifting the limit to use 100% of the remaining free space it was not reflecting it on the PBS Datastore .
is there another command to issue to expand a PBS Datastore ? or the Datastore will follow the Zpool size dinamicaly ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!