unable to acquire lock on snapshot directory - locked by another operation

Nov 22, 2020
80
11
13
51
Hi,

We have three PBS servers for our PVE, two on-site and one off-site.

All PVE backup to the same on-site PBS earch night, about 250 VMs and 60 TiB datastore

Then a remote sync job is launched both the on the second on-site PBS and a bit later on the off-site one.

So we're following the recommended architecture described in this thread:

https://forum.proxmox.com/threads/process-a-second-backup-or-sync-the-first.139132/

However about once per month one of the remote sync fails with

Code:
...
sync group vm/363 failed - unable to acquire lock on snapshot directory "/mnt/datastore/datastore1/vm/363/2024-02-17T01:43:28Z" - locked by another operation
...
TASK ERROR: sync failed with some errors.

Additional information from our analysis of these failures:
- Only one VM failed to remote sync.
- No other jobs are running at this time on the PBS.

We assume we're unlucky both remote sync jobs end up trying to sync the same VM snapshot at the same time.

Any suggestion on how to deal with this issue?

May be PBS could:
- retry at the end of the remote sync the snapshots which have failed due to "locked by another operation"
- if my understanding is correct remote sync are read-only on the remote repo and so should not conflict if "multiple reader" locks are used.

If needed we can open a bugzille or a support ticket (we have basic support on two of the three PBS involved).

Thanks!
 
Hi!
could you post the full syslog on the sending and on the pulling side?
Theoretically we do use shared locks for reading, so two sync jobs reading from the same datastore at the same time should work. Maybe check again to be sure there was no backup or other job going on on both datastores.
 
  • Like
Reactions: guerby
Hi!
could you post the full syslog on the sending and on the pulling side?
Theoretically we do use shared locks for reading, so two sync jobs reading from the same datastore at the same time should work. Maybe check again to be sure there was no backup or other job going on on both datastores.

Hi,

Indeed after looking at the logs of a failed sync I noticed that a backup job to the first onsite PBS was taking way longer than usual and so was interfering with the sync job from the other PBS.

So in this cas only a retry later or at the end on the specific snapshot which failed sync would make the sync job successful.

Thanks!
 
  • Like
Reactions: ggoller

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!