ERROR: Backup .. failed - command ''qm shutdown 10159 --skiplock && qm wait

bread-baker

Member
Mar 6, 2010
432
0
16
2 backups in a row failed on a node which did not have a failed backup since beta1

Detailed backup logs:

Code:
vzdump 10159 --quiet 1 --mailto fbcadmin@fantinibakery.com --mode stop --node fbc1 --compress 1 --maxfiles 1 --storage fbc1-storage  

10159: Dec 14 19:00:02 INFO: Starting Backup of VM 10159 (qemu) 

10159: Dec 14 19:00:02 INFO: status = running 

10159: Dec 14 19:00:02 INFO: backup mode: stop 

10159: Dec 14 19:00:02 INFO: ionice priority: 7 

10159: Dec 14 19:00:02 INFO: stopping vm

 10159: Dec 14 19:01:03 INFO: shutdown failed - got timeout 

10159: Dec 14 19:01:03 INFO: waiting until VM 10159 stopps (PID 392541) 

10159: [COLOR=red]Dec 14 19:01:03 ERROR: Backup of VM 10159 failed - command ''qm shutdown 10159 --skiplock && qm wait 10159 --timeout 600'' failed: exit code 255 [/COLOR]


Dec 14 17:04:20 INFO: Starting Backup of VM 17014 (qemu)
Dec 14 17:04:20 INFO: status = running
Dec 14 17:04:20 INFO: backup mode: stop
Dec 14 17:04:20 INFO: ionice priority: 7
Dec 14 17:04:20 INFO: stopping vm
Dec 14 17:05:21 INFO: shutdown failed - got timeout
Dec 14 17:05:21 INFO: waiting until VM 17014 stopps (PID 403222)
Dec 14 17:05:21 ERROR: Backup of VM 17014 failed - command ''qm shutdown 17014 --skiplock && qm wait 17014 --timeout 600'' failed: exit code 255



pveversion -v
pve-manager: 2.0-14 (pve-manager/2.0/6a150142)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 2.0-54
pve-kernel-2.6.32-6-pve: 2.6.32-54
lvm2: 2.02.86-1pve2
clvm: 2.02.86-1pve2
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-1
libqb: 0.6.0-1
redhat-cluster-pve: 3.1.7-1
pve-cluster: 1.0-12
qemu-server: 2.0-11
pve-firmware: 1.0-13
libpve-common-perl: 1.0-10
libpve-access-control: 1.0-3
libpve-storage-perl: 2.0-9
vncterm: 1.0-2
vzctl: 3.0.29-3pve7
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-1
ksm-control-daemon: 1.1-1
 
Last edited:
I was able to ge tthe backups to work from command line by doing this:

1-
qm stop 10159

2-then copied and paste the line from /etc/cron.d/vzdump [ except the quiet part ]

vzdump 10159 --mode stop --mailto xxxx --node fbc1 --compress 1 --maxfiles 1 --storage fbc1-storage

3 ps -ef|grep vzdump
Code:
root        1398       1  0 Dec11 ?        00:00:00 /usr/sbin/vzeventd
root        1641       2  0 Dec11 ?        00:00:00 [vzmond]
root      418018  417735  0 19:50 pts/2    00:00:00 /usr/bin/perl -w -T /usr/bin/vzdump 10159 --mode stop --mailto fbcadmin@fantinibakery.com --node fbc1 --compress 1 --maxfiles 1 --storage fbc1-storage
root      418019  418018  0 19:50 ?        00:00:00 task UPID:fbc1:000660E3:01AC2BBD:4EE9443E:vzdump::root@pam:
root      418026  418019  0 19:50 ?        00:00:00 sh -c /usr/lib/qemu-server/vmtar '/data/fbc1-storage/dump/vzdump-qemu-10159-2011_12_14-19_50_06.tmp/qemu-server.conf' 'qemu-server.conf' '/var/lib/vz/images/10159/vm-10159-disk-1.raw' 'vm-disk-ide0.raw'|gzip >/data/fbc1-storage/dump/vzdump-qemu-10159-2011_12_14-19_50_06.dat
root      418027  418026  7 19:50 ?        00:00:36 /usr/lib/qemu-server/vmtar /data/fbc1-storage/dump/vzdump-qemu-10159-2011_12_14-19_50_06.tmp/qemu-server.conf qemu-server.conf /var/lib/vz/images/10159/vm-10159-disk-1.raw vm-disk-ide0.raw

I'll post the log later if wanted.
 
log for cli ok :
Code:
Detailed backup logs:


vzdump 10159 --mailto [EMAIL="fbcadmin@fantinibakery.com"]fbcadmin@fantinibakery.com[/EMAIL] --mode stop --compress 1 --maxfiles 1 --storage fbc1-storage --node fbc1


10159: Dec 14 19:50:06 INFO: Starting Backup of VM 10159 (qemu)
10159: Dec 14 19:50:06 INFO: status = stopped
10159: Dec 14 19:50:07 INFO: backup mode: stop
10159: Dec 14 19:50:07 INFO: ionice priority: 7
10159: Dec 14 19:50:07 INFO: creating archive '/data/fbc1-storage/dump/vzdump-qemu-10159-2011_12_14-19_50_06.tgz'
10159: Dec 14 19:50:07 INFO: adding '/data/fbc1-storage/dump/vzdump-qemu-10159-2011_12_14-19_50_06.tmp/qemu-server.conf' to archive ('qemu-server.conf')
10159: Dec 14 19:50:07 INFO: adding '/var/lib/vz/images/10159/vm-10159-disk-1.raw' to archive ('vm-disk-ide0.raw')
10159: Dec 14 20:17:57 INFO: Total bytes written: 21480195584 (12.27 MiB/s)
10159: Dec 14 20:17:57 INFO: archive file size: 9.72GB
10159: Dec 14 20:17:57 INFO: delete old backup '/data/fbc1-storage/dump/vzdump-qemu-10159-2011_12_13-07_16_09.tgz'
10159: Dec 14 20:17:59 INFO: Finished Backup of VM 10159 (00:27:53)
 
I'll set the failed backup to run again tomorrow 7am Boston time tomorrow, as I'll be on site in case there is a freeze.
 
Code:
Dec 14 17:05:21 ERROR: Backup of VM 17014 failed - command ''qm shutdown  17014 --skiplock && qm wait 17014 --timeout 600'' failed: exit  code 255

The VM does not shut down? Does the shutdown work when you press the button on the GUI?
 
you need to configure the VM guest to allow anonymous ACPI shutdown. what guest OS do you run?
 
these are kvm's which have been getting successfully backed up for weeks. they are win7 and a special ltsp terminal tester [ it has no hdd, boots from network ] .

A clone [restored from the same prox 1.9 win7 backup ] of the win7 kvm is on another node, and I just shut it down successfully from the gui
 
windows have many local and domain policies to prevent anonymous shutdown, on several places, also differs for standalone systems, servers, domain members. . dig deeper there.

and check this one: https://bugzilla.proxmox.com/show_bug.cgi?id=59
 
OK i just did 2 tests, a suspend and a stop mode backup, see the results:

vzdump 17014 --quiet 1 --mailto xxxxx --mode suspend --node fbc1 --compress 1 --maxfiles 1 --storage fbc1-storage

17014: Dec 15 06:30:02 INFO: Starting Backup of VM 17014 (qemu)
17014: Dec 15 06:30:02 INFO: status = running
17014: Dec 15 06:30:02 INFO: backup mode: suspend
17014: Dec 15 06:30:02 INFO: ionice priority: 7
17014: Dec 15 06:30:02 INFO: suspend vm
17014: Dec 15 06:30:03 INFO: creating archive '/data/fbc1-storage/dump/vzdump-qemu-17014-2011_12_15-06_30_02.tgz'
17014: Dec 15 06:30:03 INFO: adding '/data/fbc1-storage/dump/vzdump-qemu-17014-2011_12_15-06_30_02.tmp/qemu-server.conf' to archive ('qemu-server.conf')
17014: Dec 15 06:30:03 INFO: Total bytes written: 2048 (0.00 MiB/s)
17014: Dec 15 06:30:03 INFO: archive file size: 0KB
17014: Dec 15 06:30:03 INFO: resume vm
17014: Dec 15 06:30:03 INFO: vm is online again after 1 seconds
17014: Dec 15 06:30:03 INFO: Finished Backup of VM 17014 (00:00:01)



vzdump 17014 --quiet 1 --mailto xxxxxx --mode stop --node fbc1 --compress 1 --maxfiles 1 --storage fbc1-storage

17014: Dec 15 06:45:01 INFO: Starting Backup of VM 17014 (qemu)
17014: Dec 15 06:45:01 INFO: status = running
17014: Dec 15 06:45:02 INFO: backup mode: stop
17014: Dec 15 06:45:02 INFO: ionice priority: 7
17014: Dec 15 06:45:02 INFO: stopping vm
17014: Dec 15 06:46:02 INFO: shutdown failed - got timeout
17014: Dec 15 06:46:02 INFO: waiting until VM 17014 stopps (PID 404324)
17014: Dec 15 06:46:03 ERROR: Backup of VM 17014 failed - command ''qm shutdown 17014 --skiplock && qm wait 17014 --timeout 600'' failed: exit code 255
 
17014: Dec 15 06:46:03 ERROR: Backup of VM 17014 failed - command ''qm shutdown 17014 --skiplock && qm wait 17014 --timeout 600'' failed: exit code 255

Seems you are using an old version of vzdump - what is the output of

# dpkg -l vzdump

(that package should not be installed)
 
on this node suspend backup works, i check earlier logs and we were using suspend until yesterday.

here is the win7 backup in suspend mode:
Detailed backup logs:

vzdump 10159 --quiet 1 --mailto fbcadmin@fantinibakery.com --mode suspend --node fbc1 --compress 1 --maxfiles 1 --storage fbc1-storage

10159: Dec 15 08:15:01 INFO: Starting Backup of VM 10159 (qemu)
10159: Dec 15 08:15:01 INFO: status = running
10159: Dec 15 08:15:01 INFO: backup mode: suspend
10159: Dec 15 08:15:01 INFO: ionice priority: 7
10159: Dec 15 08:15:02 INFO: suspend vm
10159: Dec 15 08:15:02 INFO: creating archive '/data/fbc1-storage/dump/vzdump-qemu-10159-2011_12_15-08_15_01.tgz'
10159: Dec 15 08:15:02 INFO: adding '/data/fbc1-storage/dump/vzdump-qemu-10159-2011_12_15-08_15_01.tmp/qemu-server.conf' to archive ('qemu-server.conf')
10159: Dec 15 08:15:02 INFO: adding '/var/lib/vz/images/10159/vm-10159-disk-1.raw' to archive ('vm-disk-ide0.raw')
10159: Dec 15 08:42:56 INFO: Total bytes written: 21499640832 (12.25 MiB/s)
10159: Dec 15 08:42:56 INFO: archive file size: 9.66GB
10159: Dec 15 08:42:56 INFO: delete old backup '/data/fbc1-storage/dump/vzdump-qemu-10159-2011_12_14-19_50_06.tgz'
10159: Dec 15 08:42:57 INFO: unable to write config for VM 10159
10159: Dec 15 08:42:57 INFO: resume vm
10159: Dec 15 08:42:58 INFO: vm is online again after 1676 seconds
10159: Dec 15 08:42:58 INFO: Finished Backup of VM 10159 (00:27:57)
 
I had a strange quorum at that time , the residual effect of the managed switch issue. which is why I reinstalled prox 2.0 on all 4 nodes on 12/17

3 of the nodes showed a total of 4 nodes in the cluster.

one of the nodes showed those 4 plus 2 nodes which had been removed 2 weeks ago.

i now do backups nightly for all ct's and kvms which are normally on, and use rsnapshot to put copies to each node's /bkup .


so I have been getting a lot of practice at backup, migrate.
 
This is also strange. Do you have quorum on your cluster?
AFAIR at the time of that backup we had 2 nodes live.

One of the nodes was the one which had 2 deleted nodes showing up in 'pvecm nodes'

so I thought I had a quorum , but did not.

I did not realize at the time the there was an issue with quorum , I thought I had one. I still am learning about cluster and quorums , as I did not use any in 1.9