VM frozen after backup

Matwolf

Member
Jun 20, 2021
6
2
23
35
Hello,
I have a backup job running every day backing up my VMs using snapshot mode.
Everything is running ok but sometimes one specific VM freezes (I have 4 VMs running but it happens only with one of them).
At the exact time of the backup the VM stop responding (even from the console) and I can see constant 50% CPU usage on the summary graph.
The backup terminates succesfully but the VM is marked as running, but actually completely frozen. Console shows the login prompt, but no keyboard input is processed. I can only reset it (reboot or shutdown don't work).
The VM is running Ubuntu server 22.04.
I'm running PVE v. 8.4.1.
Any Ideas?
Thanks in advance
 
Last edited:
Hi,
make sure you are using IO Thread for the VM disks and if you are using a network storage as the backup target, it might help to enable fleecing: https://pve.proxmox.com/pve-docs/chapter-vzdump.html#_vm_backup_fleecing

Please share the system logs/journal from around the time the issue happened, the backup task log and the output of the following:
Code:
pveversion -v
qm config <ID>
replacing <ID> with the actual ID of the VM.

You could also install the following packages apt install pve-qemu-kvm-dbgsym gdb systemd-coredump libproxmox-backup-qemu0-dbgsym and the next time a VM gets stuck, run:
Code:
qm status <ID> --verbose
gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/<ID>.pid)
again replacing with the actual ID both times.
 
Hi Fiona,
unfortunately the issue is not easily reproduceable, so I have to try the configs within a long time span.

Today the VM got stuck again.

The backup job didn't have fleecing enabled, so I enabled it.
Also the disks didn't have a VirtIO controller and so neither IO Thread enabled.
I changed from Default (LSI 53C895A) to VirtIO single and enabled IO Thread

Let's see if it solves the issue for good...

Below the commands' output you requested (after the described changes):
Code:
# pveversion
pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.8.12-9-pve)
root@proxwolf:~# pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-9-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8: 6.8.12-9
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
pve-kernel-5.4: 6.4-20
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
pve-kernel-5.4.203-1-pve: 5.4.203-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.0-1
proxmox-backup-file-restore: 3.4.0-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.3
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2

Code:
# qm config 100
agent: 1,fstrim_cloned_disks=1
balloon: 0
boot: order=scsi0
cores: 2
memory: 4096
name: ALPHAWOLF
net0: virtio=0A:A5:7D:0A:3C:E9,bridge=vmbr0,queues=2
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-100-disk-0,iothread=1,size=16G
scsi1: local-lvm:vm-100-disk-1,iothread=1,size=16G
scsihw: virtio-scsi-single
smbios1: uuid=04bdb7b2-8ea5-4a36-aa10-6ea29bcc3161
sockets: 1
startup: order=2
tags: network
vmgenid: e12c47b4-3abd-43a3-859d-415dd9f50b38

I'll let you know if the issue persists...

Thanks!
 
  • Like
Reactions: fiona