VMs hang on shutdown during backup job

Mar 6, 2024
23
2
3
I run Ubuntu 22.04 on all my VMs and some of them just seem to hang after they have finished their shutdown tasks, with backup option STOP.
Is there anything I can do to fix this problem or work around it?

Log error I frequently see:

Code:
qmeventd[839]: error parsing vmid for 358021: no matching qemu.slice cgroup entry
qmeventd[839]: could not get vmid from pid 358021

VMs hang at:

Code:
systemd[1]: Finished System Power Off.
systemd[1]: Reached target System Power Off.
systemd[1]: Shutting down.
systemd-shutdown[1]: Syncing filesystems and block devices.
systemd-journald[371]: Journal stopped
 
Hi,
please share the output of pveversion -v, qm config <ID> for an affected VM and cat /proc/cmdline.
 
Hi,
please share the output of pveversion -v, qm config <ID> for an affected VM and cat /proc/cmdline.

pveversion:

Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.4-2-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-3
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
amd64-microcode: 3.20230808.1.1~deb12u1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx8
intel-microcode: 3.20231114.1~deb12u1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.2-1
proxmox-backup-file-restore: 3.2.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.1.10
pve-docs: 8.2.2
pve-edk2-firmware: not correctly installed
pve-firewall: 5.0.7
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0

qm config:

Code:
agent: 1
balloon: 0
boot: order=scsi0
cores: 1
cpu: host
hotplug: memory,cpu
memory: 2048
meta: creation-qemu=8.1.5,ctime=1710971638
name: storage
net0: virtio=BC:24:11:7F:4F:37,bridge=vmbr4,firewall=1,queues=1
numa: 1
onboot: 1
ostype: l26
parent: v2
scsi0: local:100402/vm-100402-disk-0.qcow2,discard=on,iothread=1,size=100G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=f6dccc4f-3870-486c-85a2-6e7e9260c4c4
sockets: 1
startup: order=1
tags: internal
vmgenid: 9be97f43-0859-4f3d-adb7-4491dbd5b9bc

cat /proc/cmdline:

Code:
BOOT_IMAGE=/vmlinuz-6.8.4-2-pve root=/dev/mapper/vg0-root ro consoleblank=0 systemd.show_status=true consoleblank=0

Also one thing to note is that this does not happen everytime, so I can't reliably reproduce it.
The VM suffers from this issue sporadically but others do too, they rely on this VM for the fuse automount.
Perhaps this is related? I could not find anything in any of the VM's logs though.
 
Last edited:
better is backup after you have manually shutdown guest, trying with suspend (afaik, it's a "hibernated" state) is only to check if hang persist.
 
VM restart directly after backup start, so VM start during backup job, if destination is slow, VM can slowdown to hang or crash.
Try "suspend" mode.
(EDIT: incomplete+irrelevant: The VM is only started in prelaunch mode and doesn't do any execution or IO during stop mode backup. Except if you manually resume it.) For VMs, suspend mode is deprecated: https://pve.proxmox.com/pve-docs/chapter-vzdump.html#_backup_modes

@NullBy7e please share the output of cat /proc/$(cat /var/run/qemu-server/XYZ.pid)/cgroup replacing XYZ with the ID of a VM that was/is affected by the issue. You need to run it before the backup while the VM is still running. Please also share a larger excerpt of the journal from around the time the issue is happening.
 
Last edited:
  • Like
Reactions: Kingneutron
The VM is only started in prelaunch mode and doesn't do any execution or IO during stop mode backup.
https://pve.proxmox.com/pve-docs/chapter-vzdump.html#_backup_modes said:
Backup modes for VMs:
stop mode
This mode provides the highest consistency of the backup, at the cost of a short downtime in the VM operation.
It works by executing an orderly shutdown of the VM, and then runs a background QEMU process to backup the VM data.
After the backup is started, the VM goes to full operation mode if it was previously running.
Consistency is guaranteed by using the live backup feature
.
Seems different , isn't it ?
 
Screenshot at Jun 13 10-12-49.png


Hi, I faced the same problem. The VM freezes when backing up in STOP mode. Is there any solution to fix this?
 
root@pve:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.4-3-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-3
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.2
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.3-1
proxmox-backup-file-restore: 3.2.3-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.1.10
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.7
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2
 
root@pve:~# qm config 112
agent: 1
boot: order=scsi0;net0
cores: 8
cpu: x86-64-v2-AES
description:
lock: backup
memory: 8048
meta: creation-qemu=8.1.5,ctime=1717658352
name: Office2-RC2
net0: virtio=BC:24:11:C4:39:B0,bridge=vmbr2,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: Storage2_nvme:112/vm-112-disk-0.qcow2,iothread=1,size=120G
scsihw: virtio-scsi-single
smbios1: uuid=66e3076c-9524-4a22-8541-93fdabacce05
sockets: 1
vmgenid: 800c6f7d-b071-4441-8901-7695c73814b5
 
root@pve:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.4-3-pve root=/dev/mapper/pve-root ro quiet
 
Hi,
@sar please share the output of cat /proc/$(cat /var/run/qemu-server/XYZ.pid)/cgroup replacing XYZ with the ID of a VM that was/is affected by the issue and also for some other VM. You need to run it before the backup while the VM is still running.
 
@fiona
This is while VM is frezze
root@pve:~# cat /proc/$(cat /var/run/qemu-server/112.pid)/cgroup
0::/system.slice/pvescheduler.service


After restart it and VM is still running
root@pve:~# cat /proc/$(cat /var/run/qemu-server/112.pid)/cgroup
0::/qemu.slice/112.scope
 
Last edited:
The first kind of cgroup entry cannot give us the VM ID, but I'm not able to reproduce the issue (yet). Can you please also share the full backup task log and excerpt from the journal around the time the issue happens (at least from the beginning to end of the relevant backup job)?
 
@fiona

log file in attach


In log other VM id, but the same problem.

I sent you log when error started
 

Attachments

Last edited:
Please also share the full backup task log. You can find it in the node's Task History, filter by Task Type: vzdump.

Should you have another hanging VM, can you please check with cat /var/run/qemu-server/XYZ.pid if the PID matches the one from the qmeventd error messages, and then run
Code:
qm status XYZ --verbose
cat /proc/PID/cmdline
cat /proc/PID/cgroup

EDIT: does using qm resume XYZ make the VM running again?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!