"Locked backup group" error when syncing with an off-site PBS

Darkbotic

Member
Jul 10, 2024
71
2
8
Hello everyone.
I'm getting this error when running a sync job with an off-site PBS.
Any idea what could be happening? Both PBS are running the latest version.

Code:
Encountered errors: while creating locked backup group 'BackupGroup { store: "Local", ns: BackupNamespace { inner: ["MrBackup"], len: 6 }, group: BackupGroup { ty: Vm, id: "100" } }'
 
Hi,
Hello everyone.
I'm getting this error when running a sync job with an off-site PBS.
Any idea what could be happening? Both PBS are running the latest version.

Code:
Encountered errors: while creating locked backup group 'BackupGroup { store: "Local", ns: BackupNamespace { inner: ["MrBackup"], len: 6 }, group: BackupGroup { ty: Vm, id: "100" } }'
PBS uses locks to guarantee consistency in case of concurrent operations on the same backup group and/or snapshot. So it seems that the sync fails because another operation is already holding an exclusive lock for the backup group mentioned in the backup. Do you have other task (e.g. backup, sync jobs, ...) running at the same time?
 
What is the output of stat /run/proxmox-backup/old-locking, executed on the sync target PBS instance? What do you get for fuser /run/proxmox-backup/lock/Local/MrBackup/vm-100, again on the sync target?
 
Seeing something similar here almost every week (for a weekly sync job).
As there are no other tasks running at the same time (checked task history for multiple occurrences of this issue), it looks as if it the pushing PBS instance locks out itself from the receiving datastore by not releasing the previously held lock on that group fast enough.

For us, It occurs about 3 out of 4 times and affects one non-deterministically “chosen” backup group.

Verify New Snapshots is disabled on both source and target datastores.
Happened with an older PBS version from spring as well as it does with the up-to-date one.

Code:
2025-07-05T10:15:13+02:00: Percentage done: 83.64% (9/11 groups, 1/5 snapshots in group #10)
2025-07-05T10:15:13+02:00: Percentage done: 85.45% (9/11 groups, 2/5 snapshots in group #10)
2025-07-05T10:15:13+02:00: Encountered errors: while creating locked backup group 'BackupGroup { store: "hddz01", ns: BackupNamespace { inner: [], len: 0 }, group: BackupGroup { ty: Vm, id: "202" } }'
2025-07-05T10:15:13+02:00: Failed to push group vm/202 to remote!
2025-07-05T10:15:13+02:00: skipped: 14 snapshot(s) (2025-05-16T21:02:37Z .. 2025-06-27T21:03:20Z) - older than the newest snapshot present on sync target
2025-07-05T10:15:18+02:00: Percentage done: 92.73% (10/11 groups, 1/5 snapshots in group #11)

Code:
2025-07-12T08:08:40+02:00: Percentage done: 45.45% (5/11 groups)
2025-07-12T08:08:40+02:00: skipped: 12 snapshot(s) (2025-06-07T08:19:42Z .. 2025-07-04T21:00:47Z) - older than the newest snapshot present on sync target
2025-07-12T08:52:42+02:00: Percentage done: 47.27% (5/11 groups, 1/5 snapshots in group #6)
2025-07-12T09:00:13+02:00: Percentage done: 49.09% (5/11 groups, 2/5 snapshots in group #6)
2025-07-12T09:00:13+02:00: Percentage done: 50.91% (5/11 groups, 3/5 snapshots in group #6)
2025-07-12T09:00:13+02:00: Encountered errors: while creating locked backup group 'BackupGroup { store: "hddz01", ns: BackupNamespace { inner: [], len: 0 }, group: BackupGroup { ty: Vm, id: "105" } }'
2025-07-12T09:00:13+02:00: Failed to push group vm/105 to remote!
2025-07-12T09:00:13+02:00: skipped: 1 snapshot(s) (2025-06-07T07:55:42Z) - older than the newest snapshot present on sync target
2025-07-12T09:00:13+02:00: skipped: 1 snapshot(s) (2025-06-07T08:15:21Z) - older than the newest snapshot present on sync target
2025-07-12T09:00:13+02:00: skipped: 15 snapshot(s) (2025-05-16T21:00:39Z .. 2025-07-04T21:01:46Z) - older than the newest snapshot present on sync target
2025-07-12T09:00:18+02:00: Percentage done: 74.55% (8/11 groups, 1/5 snapshots in group #9)

Code:
2025-07-19T07:29:48+02:00: Percentage done: 25.45% (2/11 groups, 4/5 snapshots in group #3)
2025-07-19T07:29:53+02:00: Percentage done: 27.27% (3/11 groups)
2025-07-19T07:29:53+02:00: skipped: 17 snapshot(s) (2025-05-16T21:00:16Z .. 2025-07-11T21:00:47Z) - older than the newest snapshot present on sync target
2025-07-19T07:30:44+02:00: Percentage done: 29.09% (3/11 groups, 1/5 snapshots in group #4)
2025-07-19T07:30:44+02:00: Percentage done: 30.91% (3/11 groups, 2/5 snapshots in group #4)
2025-07-19T07:30:44+02:00: Encountered errors: while creating locked backup group 'BackupGroup { store: "hddz01", ns: BackupNamespace { inner: [], len: 0 }, group: BackupGroup { ty: Vm, id: "103" } }'
2025-07-19T07:30:44+02:00: Failed to push group vm/103 to remote!
2025-07-19T07:30:44+02:00: skipped: 17 snapshot(s) (2025-05-16T21:00:22Z .. 2025-07-11T21:00:53Z) - older than the newest snapshot present on sync target
2025-07-19T07:38:30+02:00: Percentage done: 38.18% (4/11 groups, 1/5 snapshots in group #5)

Code:
2025-06-21T07:00:00+02:00: Found 11 groups to sync (out of 11 total)
2025-06-21T07:00:01+02:00: skipped: 15 snapshot(s) (2025-05-16T21:00:01Z .. 2025-06-07T07:21:20Z) - older than the newest snapshot present on sync target
2025-06-21T07:00:01+02:00: skipped: 13 snapshot(s) (2025-05-16T21:00:03Z .. 2025-06-13T21:00:04Z) - older than the newest snapshot present on sync target
2025-06-21T07:08:25+02:00: Percentage done: 10.91% (1/11 groups, 1/5 snapshots in group #2)
2025-06-21T07:16:01+02:00: Percentage done: 12.73% (1/11 groups, 2/5 snapshots in group #2)
2025-06-21T07:16:01+02:00: Percentage done: 14.55% (1/11 groups, 3/5 snapshots in group #2)
2025-06-21T07:16:01+02:00: Encountered errors: unable to acquire lock on backup group directory "/mnt/datastore/hddz01/vm/101" - another backup is already running
2025-06-21T07:16:01+02:00: Failed to push group vm/101 to remote!
2025-06-21T07:16:01+02:00: skipped: 13 snapshot(s) (2025-05-16T21:00:12Z .. 2025-06-13T21:00:14Z) - older than the newest snapshot present on sync target
2025-06-21T07:16:06+02:00: Percentage done: 20.45% (2/11 groups, 1/4 snapshots in group #3)

Code:
2025-06-07T07:03:23+02:00: Percentage done: 10.42% (0/8 groups, 5/6 snapshots in group #1)
2025-06-07T07:04:50+02:00: Percentage done: 12.50% (1/8 groups)
2025-06-07T07:04:50+02:00: skipped: 11 snapshot(s) (2025-05-16T21:00:03Z .. 2025-05-30T21:00:07Z) - older than the newest snapshot present on sync target
2025-06-07T07:10:28+02:00: Percentage done: 15.00% (1/8 groups, 1/5 snapshots in group #2)
2025-06-07T07:16:11+02:00: Percentage done: 17.50% (1/8 groups, 2/5 snapshots in group #2)
2025-06-07T07:16:11+02:00: Percentage done: 20.00% (1/8 groups, 3/5 snapshots in group #2)
2025-06-07T07:16:11+02:00: Encountered errors: unable to acquire lock on backup group directory "/mnt/datastore/hddz01/vm/101" - another backup is already running
2025-06-07T07:16:11+02:00: Failed to push group vm/101 to remote!
2025-06-07T07:16:11+02:00: skipped: 11 snapshot(s) (2025-05-16T21:00:12Z .. 2025-05-30T21:00:18Z) - older than the newest snapshot present on sync target
2025-06-07T07:16:16+02:00: Percentage done: 27.50% (2/8 groups, 1/5 snapshots in group #3)

The workaround is obvious: we re-run the syncjob after the partially failed one has finished.
(Needless to say, the “Sync remote 'xyz' datastore 'hddz01' failed” error notification leaves a bit of a bad impression though. Clients prefer no errors over “if the Saturday sync job fails, the Sunday one picks up the failed backups anyway”.)
 
Okay, I agree that given your description of the issue this is most likely the previous backup run still not having dropped the backup group lock, while the source PBS instance already tries to create the next snapshot, which again tries to acquire the lock. Please open an issue at https://bugzilla.proxmox.com also linking to this forum thread. Thanks for providing the details!