Hi,
We have three PBS servers for our PVE, two on-site and one off-site.
All PVE backup to the same on-site PBS earch night, about 250 VMs and 60 TiB datastore
Then a remote sync job is launched both the on the second on-site PBS and a bit later on the off-site one.
So we're following the recommended architecture described in this thread:
https://forum.proxmox.com/threads/process-a-second-backup-or-sync-the-first.139132/
However about once per month one of the remote sync fails with
Additional information from our analysis of these failures:
- Only one VM failed to remote sync.
- No other jobs are running at this time on the PBS.
We assume we're unlucky both remote sync jobs end up trying to sync the same VM snapshot at the same time.
Any suggestion on how to deal with this issue?
May be PBS could:
- retry at the end of the remote sync the snapshots which have failed due to "locked by another operation"
- if my understanding is correct remote sync are read-only on the remote repo and so should not conflict if "multiple reader" locks are used.
If needed we can open a bugzille or a support ticket (we have basic support on two of the three PBS involved).
Thanks!
We have three PBS servers for our PVE, two on-site and one off-site.
All PVE backup to the same on-site PBS earch night, about 250 VMs and 60 TiB datastore
Then a remote sync job is launched both the on the second on-site PBS and a bit later on the off-site one.
So we're following the recommended architecture described in this thread:
https://forum.proxmox.com/threads/process-a-second-backup-or-sync-the-first.139132/
However about once per month one of the remote sync fails with
Code:
...
sync group vm/363 failed - unable to acquire lock on snapshot directory "/mnt/datastore/datastore1/vm/363/2024-02-17T01:43:28Z" - locked by another operation
...
TASK ERROR: sync failed with some errors.
Additional information from our analysis of these failures:
- Only one VM failed to remote sync.
- No other jobs are running at this time on the PBS.
We assume we're unlucky both remote sync jobs end up trying to sync the same VM snapshot at the same time.
Any suggestion on how to deal with this issue?
May be PBS could:
- retry at the end of the remote sync the snapshots which have failed due to "locked by another operation"
- if my understanding is correct remote sync are read-only on the remote repo and so should not conflict if "multiple reader" locks are used.
If needed we can open a bugzille or a support ticket (we have basic support on two of the three PBS involved).
Thanks!