Subject: PSA-2025-00019-1: Race condition during long-running garbage collection and pruning of recent snapshots may lead to back up corruption before Proxmox Backup Server 3.4
Advisory date: 2025-10-27
Packages: proxmox-backup-server
Details: On certain setups running Proxmox Backup Server 3.3 and below, a race condition between GC and normal snapshot deletion could cause garbage collection (GC) to delete chunks even though they are referenced by backup snapshots. Affected backup snapshots fail verification and cannot be restored.
The issue can trigger when the following sequence of events occurs:
1. A GC job starts.
2. GC phase 1 starts, generating a list of snapshot index files for which to mark in-use chunks.
3. Before GC phase 1 reaches a specific (backup) group G, a new incremental (backup) snapshot S_1 based on a previous snapshot S_0 is created in group G. Snapshot S_1 references unchanged chunks already known and referenced by S_0.
4. Before GC phase 1 reaches snapshot S_0 in group G, snapshot S_0 is pruned by a prune job or through (manual) deletion an eligible user.
In this case, GC phase 1 will not mark chunks which were referenced only by snapshots S_0 and S_1, since snapshot S_0 does not exist anymore, and S_1 is not included in the list of to-be-marked snapshots generated in step 2.
As a result, any such chunks:
- that have already existed before the start of the GC (or the cutoff threshold, if it is earlier).
- and that have not been marked due to references by other snapshots/groups.
will be treated as garbage and removed by GC phase 2.
Note: While there are more complicated variants (e.g., pruning more snapshots, doing multiple backups, ...), the above describes the basic issue.
The chance of triggering the race condition is low on most setups with recommended specs, in particular if datastores are backed by fast SSDs.  The likelihood increases with several factors, e.g.:
- Large and/or slow datastores, for example on a network share or local HDD-backed storage, increase the runtime of GC jobs and thus the chance that snapshots are created and pruned while a GC job is running.
- Frequent pruning, for example via a frequently-running prune job or by applying retention on the Proxmox VE side, increases the chance that snapshots are pruned while a GC job is running.
- Aggressive pruning with retention settings that favor deleting relatively recent snapshots (e.g. 
keep-last or 
keep-daily with low values) increase the chance of pruning a snapshot that is used as a base for an incremental backup while a GC job is running.
Note: The actual security implications are rather low, i.e. an attacker wanting to leverage this needs to:
- be able to control or know the GC job schedule timing.
- be able to create backups or control the backup job schedule timing.
- be able to delete backups or control the prune job schedule timing.
- know a chunk hash that is used by another backup snapshot on the same datastore.
- get lucky with timing.
As the combination of those either imply a user being a datastore administrator or at least being the owner of the backup group in addition to having the DatastorePowerUser role, or similar.  Knowing a chunk hash also means that it's very probable that the attacker have to know the content for the chunk they are going to attack, as guessing a sha256 hash otherwise is – in practice – impossible.
Fixed: The likelihood of the issue triggering was greatly reduced with proxmox-backup-server 3.4.0-1, and the issue was fully fixed in 3.4.1-1. Proxmox Backup Server 4 was never affected by this issue.
EDIT 2025-10-29: clarify (relatively low) security implication.