Windows XP vzdump failures

Apr 26, 2018
111
10
23
I have a legacy Windows XP KVM running on Proxmox. The KVM system has been running for a long time. Recently the respective vzdump started failing. The cron job uses stop mode. As this is a legacy system we do not tinker with anything in the KVM.

Have there been any Proxmox updates that could trigger this behavior?

Output snippet:

Code:
123: 2018-05-16 00:32:03 INFO: Starting Backup of VM 123 (qemu)
123: 2018-05-16 00:32:03 INFO: status = running
123: 2018-05-16 00:32:04 INFO: update VM 123: -lock backup
123: 2018-05-16 00:32:04 INFO: backup mode: stop
123: 2018-05-16 00:32:04 INFO: ionice priority: 7
123: 2018-05-16 00:32:04 INFO: VM Name: cnr
123: 2018-05-16 00:32:04 INFO: include disk 'ide0' 'local-lvm:vm-123-disk-1' 60G
123: 2018-05-16 00:32:04 INFO: stopping vm
123: 2018-05-16 00:32:30 INFO: creating archive '/mnt/pve/localnas/dump/vzdump-qemu-123-2018_05_16-00_32_03.vma.lzo'
123: 2018-05-16 00:32:30 INFO: starting kvm to execute backup task
123: 2018-05-16 00:32:32 INFO: started backup task 'd70288ee-a91b-4212-bba4-2997c09538cc'
123: 2018-05-16 00:32:32 INFO: resume VM
123: 2018-05-16 00:32:35 INFO: status: 0% (281280512/64424509440), sparse 0% (1953792), duration 3, read/write 93/93 MB/s
123: 2018-05-16 00:32:41 INFO: status: 1% (666632192/64424509440), sparse 0% (8454144), duration 9, read/write 64/63 MB/s
123: 2018-05-16 00:32:48 INFO: status: 2% (1299513344/64424509440), sparse 0% (26632192), duration 16, read/write 90/87 MB/s
...
123: 2018-05-16 00:34:49 INFO: status: 52% (33561706496/64424509440), sparse 43% (28028383232), duration 137, read/write 95/94 MB/s
123: 2018-05-16 00:35:05 ERROR: VM 123 not running
123: 2018-05-16 00:35:05 INFO: aborting backup job
123: 2018-05-16 00:35:05 ERROR: VM 123 not running
123: 2018-05-16 00:35:06 INFO: restarting vm
123: 2018-05-16 00:35:08 ERROR: Backup of VM 123 failed - VM 123 not running

Cron job:

Code:
vzdump 123 --quiet 1 --mode stop --storage localnas

I found references that the qemu agent and virtio serial driver are recommended. Yet this XP system has been running fine without those services and the failures only started recently. Perhaps something recently changed in Proxmox that now requires these services?

After reading the wiki, seems to me that the qemu agent and virtio drivers are needed for snapshot mode but not for stop mode. Perhaps I misunderstand.

Thanks for any help. :)
 
does the VM image use qcow2 format? If so, verify the image using "qemu-img check ..."
Thank you for the quick reply. :)

I did not install this specific system and I am feeling inept about the actual file location. I found /dev/pve/vm-223-disk-1 but there is no file extension. The conf file indicates 60 GB is allocated but I found no such files of that size.

I ran an internal chkdsk and rebooted -- there was one "error" message that some free space was allocated. I repeated the exercise and there were no subsequent messages. I don't know if that is related as cause or effect to the original vzdump symptoms.

I can manually start/stop the image with the qm command with no errors.
 
I did not install this specific system and I am feeling inept about the actual file location. I found /dev/pve/vm-223-disk-1 but there is no file extension.

Yes, this is an LVM raw device, so .qcow2 can't be the problem. What version do you run exactly?

# pveversion -v
 
I'm sorry. I forgot to post the version info.:

Code:
pveversion -v                                      
proxmox-ve: 5.1-43 (running kernel: 4.15.17-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.13: 5.1-44
pve-kernel-4.15: 5.1-4
pve-kernel-4.15.17-1-pve: 4.15.17-8
pve-kernel-4.15.15-1-pve: 4.15.15-6
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-21
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-17
pve-cluster: 5.0-27
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9

The XP system hung again last night during the vzdump. The status message got to 4%. The quicker failure is explained by using a different backup server last night that is not directly connected at 1Gbps.

I ran chkdsk again this morning with no errors reported.
 
I think this is depending on the XP Installation, Drivers, Software... because we have also XP's running. Some frome them going fine with snapshot, stopmode didn't work on none of those. Here is "suspend" working fine.
 
Also, please can you update to latest version and test again?
Dietmar, thank you for replying! Weekend here. I will update Monday.

Seems there are actually two issues.

The first issue is the KVM not restarting when the backup fails and aborts. As Stop mode is being used, the KVM should be restarted regardless of the vzdump exit code. At the moment this is the more important issue for us. Do the updates address this issue?

The second issue is the vzdump failing. This issue started very recently. There were no immediate Proxmox or Debian updates, although the most previous Proxmox updates about a week prior included the libpve-storage-perl package. Might be related, might not. The vzdump failure could be caused by a variety of reasons. I plan to inspect for hardware issues.

P.S. I convinced the owner to buy a subscription. Is a bank transfer and PayPal the only supported methods of payment? Also, payment at this end is $US rather than Euros. Does the subscription process automatically convert and if yes, how is the conversion calculated?
 
A quick report update. I installed the Proxmox updates yesterday. The backups last night succeeded without event and the VM also restarted. We use a rotation of different backup servers, each with different connection speeds and I won't be able to provide a full report for several days. So far so good....:)
 
I have been digging into this all week. A recent discovery is disabling lzo compression seems to allow the backups to complete. I do not yet have a sufficient number of backups to confidently say that is part of the problem, but curious nonetheless.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!