VM hard freezes on Backup

albindy

New Member
Sep 20, 2019
11
0
1
37
Hi there!

Since one of the last Upgrades - i think it happened first after upgrade to 6.2.x we have a strange Problem.
While Backupjob, the VM that is in progress gets a hard freeze if there is no space left on the Backup Device.
Before this Release there was no Problem when running into 'no space left on device'.
For our Setup this is crucial.
We use NAS and RDX for Backup. The Config for the RDX is a RAM Drive which was set to 30 MB. In this RAM Drive the RDX is decrypted and mounted 2 Hours befor the
Backupjob starts. When there is no RDX present the Backup stops after writing the 30 MB with no space left on device and generates an Email error and it is done.
The intention of the RAM Drive is not to write to the root Partition until is full and kills the PVE.

Now the situation looks like following:
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
lzop: No space left on device: <stdout>
ERROR: VM 100 qmp command 'guest-fsfreeze-thaw' failed - got timeout
INFO: started backup task '01fe6627-d31a-4097-94e6-a270fc022416'
INFO: resuming VM again
ERROR: VM 100 qmp command 'cont' failed - got timeout
INFO: aborting backup job

Further access to the VM results in:
qm status 'ID'
running

qm suspend 'ID'
unable to connect to VM 100 qmp socket

VM 'ID' qmp command 'change' failed - unable to connect to VM 'ID' qmp socket - timeout after 600 retries
TASK ERROR: Failed to run vncproxy.

No action except stopping the VM clears the Freeze.

I raised the RAM drive to 2 GB because i thought that the no space left error occured to early in the process while the VM was still in preparing the snapshot.
With no success.

I also tried Backing up the VM when it was turned off but got an even stranger result. The Backup obviously failed.

INFO: starting new backup job: vzdump 100 --quiet 1 --node pve --mailnotification always --storage Quickstore_RDX --all 0 --compress lzo --mode snapshot --mailto our Mailadress
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2021-01-03 12:05:15
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: WIN-SRV
INFO: include disk 'virtio0' 'local-lvm:vm-100-disk-0' 150G
INFO: include disk 'virtio1' 'local-lvm:vm-100-disk-1' 500G
INFO: exclude disk 'virtio2' 'WIN-Backup:vm-100-disk-0' (backup=no)
INFO: creating vzdump archive '/mnt/pve/quickstore/rdx/dump/vzdump-qemu-100-2021_01_03-12_05_15.vma.lzo'
INFO: starting kvm to execute backup task
INFO: started backup task 'c449718a-36b7-4805-87b0-7aba5f2c8654'
INFO: 0% (441.4 MiB of 650.0 GiB) in 3s, read: 147.1 MiB/s, write: 130.9 MiB/s
lzop: No space left on device: <stdout>
ERROR: VM 100 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 100 qmp command 'backup-cancel' failed - unable to connect to VM 100 qmp socket - timeout after 5987 retries
VM 100 qmp command 'query-status' failed - unable to connect to VM 100 qmp socket - timeout after 31 retries
ERROR: Backup of VM 100 failed - VM 100 qmp command 'query-backup' failed - got timeout
INFO: Failed at 2021-01-03 12:25:48
INFO: Backup job finished with errors
TASK ERROR: job errors

But after the Backup the VM was in
status: running
but unresponsive. So i had to Stop the turned off VM to get it back working.

Any advice is welcome!
KR and happy new year!
Alex

The system is on the last Upgrade State:
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
Last edited:

Stefan_R

Proxmox Staff Member
Staff member
Jun 4, 2019
855
153
43
Vienna
Hm, the only issue I could reproduce with a full storage is the one fixed by the linked patch... It should be in all repositories by now, but just in case, can you post your 'pveversion -v'? Keep in mind the new version is only used once the VM in question has been restarted completely at least once (i.e. from PVE, not from within the guest).

Potentially also post some more detail on your setup, like your /etc/pve/storage.cfg and the VM config ('qm config <vmid>').

And just as a shot in the dark: https://bugzilla.proxmox.com/show_bug.cgi?id=2723#c18 mentions an error only occuring when tested from the GUI - maybe try starting a backup manually from the CLI (via 'vzdump') and seeing if the error is still reproducible?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!