vm locked after failed backup, can't unlock

mmenaz

Renowned Member
Jun 25, 2009
835
26
93
Northern east Italy
Hi, from web interface, with a stopped kvm, I issued a backup that failed due to lack of free space.
I removed previous backups and clicked the "backup now" again.
Then I got this error:
Code:
INFO: starting new backup job: vzdump 102 --remove 0 --mode snapshot --compress lzo --storage backup --node prox2
INFO: Starting Backup of VM 102 (qemu)
INFO: status = stopped
INFO: VM is locked (backup)
ERROR: Backup of VM 102 failed - command 'qm set 102 --lock backup' failed: exit code 25
INFO: Backup job finished with errors
TASK ERROR: job errors
ps ax | grep vz
or
ps ax | grep dum
showed no pending dump process, so I decided to unlock manually.
qm manual about lock says:
-lock (backup | migrate) Lock/unlock the VM.
so I thought that issuing the command again would have unlocked:
Code:
# qm set 102 --lock backup
VM is locked (backup)
mmm no luck.. what does the man page mean?
then I tried
Code:
# qm unlock 102
unable to open file '/etc/pve/nodes/prox2/qemu-server/102.conf.tmp.6630' - Input/output error
in fact:
Code:
# ls -l /etc/pve/nodes/prox2/qemu-server/
totale 2
-rw-r----- 1 root www-data 325 25 dic 20.26 100.conf
-rw-r----- 1 root www-data 234 22 dic 01.01 101.conf
-rw-r----- 1 root www-data 271  7 apr 18.13 102.conf
-rw-r----- 1 root www-data   0  7 apr 18.21 102.conf.tmp.4765
#
rm /etc/pve/nodes/prox2/qemu-server/102.conf.tmp.4765
rm: cannot remove `/etc/pve/nodes/prox2/qemu-server/102.conf.tmp.4765': Input/output error

even if
# lsof | grep 4765
shows nothing
sigh!
Code:
# df -h
File system           Dim. Usati Disp. Uso% Montato su
/dev/mapper/pve-root   37G   20G   16G  55% /
tmpfs                 3,9G     0  3,9G   0% /lib/init/rw
udev                  3,9G  316K  3,9G   1% /dev
tmpfs                 3,9G   19M  3,9G   1% /dev/shm
/dev/mapper/pve-data   99G   48G   52G  49% /var/lib/vz
/dev/sdc1             495M   78M  393M  17% /boot
/dev/fuse              30M   12K   30M   1% /etc/pve

and finally

Code:
# pveversion -v
pve-manager: 2.0-57 (pve-manager/2.0/ff6cd700)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-65
pve-kernel-2.6.32-11-pve: 2.6.32-65
lvm2: 2.02.88-2pve2
clvm: 2.02.88-2pve2
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-36
pve-firmware: 1.0-15
libpve-common-perl: 1.0-25
libpve-access-control: 1.0-17
libpve-storage-perl: 2.0-17
vncterm: 1.0-2
vzctl: 3.0.30-2pve2
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1
I'm going to reboot proxmox, but would not be a good situation in production.
Thanks
 
Don't think so, since is a normal bare metal installation with plenty of free space in "data". The only space filled was "root" since is where I've put all my backups (/backup).
Once rebooted, I tried again backup but failed. qm unlock 102 worked instead and then I was able to do the backup. I apologise for not having kept the first error message, but looked a so simple situation (I was aware to be short in backup space) that did not minded. Maybe tomorrow I can try to do some backups until the space fills again and report back error message, vgs and lvs, just for completeness
Thanks a lot
 
Don't think so, since is a normal bare metal installation with plenty of free space in "data". The only space filled was "root" since is where I've put all my backups (/backup).
Once rebooted, I tried again backup but failed. qm unlock 102 worked instead and then I was able to do the backup. I apologise for not having kept the first error message, but looked a so simple situation (I was aware to be short in backup space) that did not minded. Maybe tomorrow I can try to do some backups until the space fills again and report back error message, vgs and lvs, just for completeness
Thanks a lot
Hi,
free space in root and data are nice but not helpfull for snapshots. If vzdump create an snapshot for the lv, all changes was stored in a new lv (which is on the free space of the VG (not data or root)).
And, very important, your backup-destination must be outside of this lv. Otherwise, you fill the snapshot with backup-data.

Udo
 
Yes, thanks, I know, that's why I told that is a "normal" installation, so 4GB of VG free space is present, and no vm was running, so no possibility to fill it (is a test PC).
Of course my backup is outside lv, in fact is /backup and that's why I filled the space.
Isn't strange the
qm unlock 102 unable to open file '/etc/pve/nodes/prox2/qemu-server/102.conf.tmp.6630' - Input/output error
when there was 102.conf.tmp.4765 only?
seems a bug somewhere, but I have to do some test and try to reproduce (is a dual boot machine, and my children are using it at the moment so... )
 
hineshummer27 That definitely worked for me. The hung backup processes were removed with no headache using that method.
 
For me I had to go to the shell of the host and do "qm unlock (vmid)", then I had to force stop the VM, then force start it. Now it's working. I did not have to restart the cluster, that's a silly requirement for fixing a single VM.
 
  • Like
Reactions: gregwbrooks
I have this same problem in a High Availability cluster with Proxmox VE 4.3. Some virtual machines got lock after backup failure because of a network problem communication with the NFS, where backups are stored.

How can be automated the process of unlock and start of a locked virtual machine?
 
How do I unlock a VM? Where is this found? Stop and shutdown commands do not work.
 
Help. I just stopped a restore soon after starting it and it locked the VM. As it had already erased everything on the VM I have no access to vm so how can I unlock it?