Backup failed: Can't acquire lock

proxwolfe

Active Member
Jun 20, 2020
439
34
33
49
Hi,

I am transitioning from backing up to a NAS to backing up to a PBS.

Since yesterday, I have two backup jobs scheduled every night. They both are to backup the same VMs/LXCs - but starting at different times.

- Starting at 1am to the NAS
- Starting at 2am to the PBS

The backup job to the NAS went through without issues (ended close to 9am).

The backup job to the PBS failed with the following log entries:

Code:
INFO: trying to get global lock - waiting...
ERROR: can't acquire lock '/var/run/vzdump.lock' - got timeout
TASK ERROR: got unexpected control message:

Can only one backup job run at a time? Or is something else going on?

Thanks!
 
Hi,
Hi,

I am transitioning from backing up to a NAS to backing up to a PBS.

Since yesterday, I have two backup jobs scheduled every night. They both are to backup the same VMs/LXCs - but starting at different times.

- Starting at 1am to the NAS
- Starting at 2am to the PBS

The backup job to the NAS went through without issues (ended close to 9am).

The backup job to the PBS failed with the following log entries:

Code:
INFO: trying to get global lock - waiting...
ERROR: can't acquire lock '/var/run/vzdump.lock' - got timeout
TASK ERROR: got unexpected control message:

Can only one backup job run at a time? Or is something else going on?
yes, there is a global lock for vzdump. The default wait time for the lock is 180 minutes, but can be controlled with --lockwait.

 
yes, there is a global lock for vzdump. The default wait time for the lock is 180 minutes, but can be controlled with --lockwait
That explains it.

So, is this necessary? I mean, is there a technical reason why you shouldn't run two backup jobs at the same time? What is to be protected by the lock? The VM, the PVE host and/or the backup target?

Does it matter, if the backup jobs backup different VMs/LXCs? (In my case they don't, just trying to understand the mechanics.)

Does it matter, if the backup jobs backup the same VMs/LXCs but at overlapping times? (Like in my case with a one hour offset.)

Does it matter, if the backup jobs backup to different backup target? (Like in my case to a NAS and a PBS.)

Is it safe to reduce the lock time in my case? Can I use "--lockwait" from the GUI or do I need to modify some config file from the terminal?

Thanks!
 
That explains it.

So, is this necessary? I mean, is there a technical reason why you shouldn't run two backup jobs at the same time? What is to be protected by the lock? The VM, the PVE host and/or the backup target?
I think, because a backup can put quite a bit of load on the host/network.

Does it matter, if the backup jobs backup different VMs/LXCs? (In my case they don't, just trying to understand the mechanics.)

Does it matter, if the backup jobs backup the same VMs/LXCs but at overlapping times? (Like in my case with a one hour offset.)

Does it matter, if the backup jobs backup to different backup target? (Like in my case to a NAS and a PBS.)
No, it's at most one active vzdump process at a time.

Is it safe to reduce the lock time in my case?
Then the backup will fail earlier. If you want to be sure the second backup runs as well, you'd need to increase the lock wait. So that it will still be waiting at the time the first backup is finished.

Can I use "--lockwait" from the GUI or do I need to modify some config file from the terminal?
This is not possible via GUI AFAICT. The file with the vzdump jobs is /etc/pve/vzdump.cron.

 
No, it's at most one active vzdump process at a time.
I'm not sure I fully understand this yet: Is this another limitation, i.e. technically only one vzdump process can be active at a time? Or is it because of the locking mechanism which prevents more than one vzdump process from becoming active? In other words: If there were no locking, could there be two vzdump processes running at the same time?


Then the backup will fail earlier.
So then it would not fail because of the lock but because of some other mechanism (like only one vzdump process can be active at a time), right?


because a backup can put quite a bit of load on the host/network.
During the night, reduced performance/responsiveness would be acceptable to me (but of course that depends on the use case). Can the backup be sped up by allocating more resources (CPU) to the VM?


Maybe the easiest solution would be to just deactivate the old backup job and see, if the new one works alright. I would just be more comfortable to keep both for a transition period. Or maybe, I can schedule them one after the other, provided from the second run onwards, it doesn't take so long anymore.
 
I'm not sure I fully understand this yet: Is this another limitation, i.e. technically only one vzdump process can be active at a time? Or is it because of the locking mechanism which prevents more than one vzdump process from becoming active? In other words: If there were no locking, could there be two vzdump processes running at the same time?
Yes, it's the locking mechanism. I don't think there's anything in general that would prevent it (but I can't guarantee it either, there might be some corner case/assumption I'm missing). But if both jobs would reach the same machine at the same time one would fail, which is also not ideal.

So then it would not fail because of the lock but because of some other mechanism (like only one vzdump process can be active at a time), right?
It would fail because the time for waiting for the lock has run out. When vzdump starts, it tries to acquire the lock and waits for the configured lockwait time. If it can acquire the lock within the time, it will execute. If it cannot acquire the lock within the time, it aborts. If there is no other instance, it will get the lock immediately and start executing immediately.

During the night, reduced performance/responsiveness would be acceptable to me (but of course that depends on the use case). Can the backup be sped up by allocating more resources (CPU) to the VM?
I don't think this will make a difference. While it's true that the VM is started for the backup, it's started into a paused state and reading the data happens in a separate thread, which is not bound by the CPU limit configured for the VM AFAIK.

Maybe the easiest solution would be to just deactivate the old backup job and see, if the new one works alright. I would just be more comfortable to keep both for a transition period. Or maybe, I can schedule them one after the other, provided from the second run onwards, it doesn't take so long anymore.
If you can't finish both backups within the night, that might be better. If you want to make sure both backups run regardless, you should increase lockwait, so that the second backup will not abort after 180 minutes, but wait longer until the first one is finished.
 
  • Like
Reactions: proxwolfe
If you can't finish both backups within the night, that might be better. If you want to make sure both backups run regardless, you should increase lockwait, so that the second backup will not abort after 180 minutes, but wait longer until the first one is finished.
Understood - thank you!
 
Yes, it's the locking mechanism. I don't think there's anything in general that would prevent it (but I can't guarantee it either, there might be some corner case/assumption I'm missing). But if both jobs would reach the same machine at the same time one would fail, which is also not ideal.
One more question please:

We established above, that it is not advisable (and, therefore, prevented) that two vzdump processes run at the same time on the same PVE host.

Are there any limitations as to how many backup jobs (from different PVE hosts or different proxmox backup clients) can target one PBS at the same time?

Thanks!
 
One more question please:

We established above, that it is not advisable (and, therefore, prevented) that two vzdump processes run at the same time on the same PVE host.

Are there any limitations as to how many backup jobs (from different PVE hosts or different proxmox backup clients) can target one PBS at the same time?
I'm not aware of such a limit, as long as the IDs are different (you should use one datastore for each cluster/standalone node anyways).
 
  • Like
Reactions: proxwolfe
Hi guys, to solve this problem just kill all vzdump processes and after that delete the file /var/run/vzdump.lock, when starting a new routine it will create the file again and everything will be solved.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!