Problems with vzdump, qm suspend/resume lock timeout

e100

Renowned Member
Nov 6, 2010
1,268
46
88
Columbus, Ohio
ulbuilder.wordpress.com
On the pve-user mailing list someone else is having a similar issue with no resolution:
http://pve.proxmox.com/pipermail/pve-user/2012-October/004897.html

When vzdump is running sometimes qm resume or qm suspend will randomly fail.
I am still running pve 2.1, but I doubt this is some problem that was fixed in 2.2.

Code:
138: Oct 31 00:22:57 INFO: Starting Backup of VM 138 (qemu)
138: Oct 31 00:22:57 INFO: status = running
138: Oct 31 00:22:57 INFO: backup mode: snapshot
138: Oct 31 00:22:57 INFO: ionice priority: 7
138: Oct 31 00:22:57 INFO: suspend vm to make snapshot
138: Oct 31 00:22:57 INFO: trying to aquire lock... failed
138: Oct 31 00:22:57 INFO: can't lock file '/var/log/pve/tasks/.active.lock' - can't aquire lock - Interrupted system call
[B]138: Oct 31 00:22:58 ERROR: Backup of VM 138 failed - command 'qm suspend 138 --skiplock' failed: exit code 4[/B]

Code:
115: Oct 31 00:12:02 INFO: Starting Backup of VM 115 (qemu)
115: Oct 31 00:12:02 INFO: status = running
115: Oct 31 00:12:02 INFO: backup mode: snapshot
115: Oct 31 00:12:02 INFO: ionice priority: 7
115: Oct 31 00:12:02 INFO: suspend vm to make snapshot
115: Oct 31 00:12:03 INFO:   Logical volume "vzsnap-vm1-0" created
115: Oct 31 00:12:03 INFO:   Logical volume "vzsnap-vm1-1" created
115: Oct 31 00:12:05 INFO:   Logical volume "vzsnap-vm1-2" created
115: Oct 31 00:12:05 INFO: resume vm
115: Oct 31 00:12:05 INFO: trying to aquire lock... failed
115: Oct 31 00:12:05 INFO: can't lock file '/var/log/pve/tasks/.active.lock' - can't aquire lock - Interrupted system call
[B]115: Oct 31 00:12:06 ERROR: Backup of VM 115 failed - command 'qm resume 115 --skiplock' failed: exit code 4[/B]

We have been upgrading CPUs/motherboards in our cluster, today we upgraded the 2nd server.
During both upgrades a 6 core CPU was replaced with an 8 core CPU with hyperthreading, so we moved from a 6 core to 16 core in two servers.
The above errors happened on two different nodes, neither error occurred on a node that was upgraded.

There are a total of 17 nodes in this cluster
Before the upgrades there were a total of 108 cores, after the upgrades there are a total of 128 cores.
Could the additional CPU cores cause this problem?

Any suggestions?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!