VMs hang on shutdown during backup job

Again, replacing XYZ with the VM ID of a stuck VM, can you check what the following show?
Code:
systemctl status XYZ.scope
cat /proc/$(cat /var/run/qemu-server/XYZ.pid)/status | grep PPid
and then using the result of the last command
Code:
cat /proc/PPID/cmdline

Are there any systemd settings you modified?
 
Again, replacing XYZ with the VM ID of a stuck VM, can you check what the following show?
Code:
systemctl status XYZ.scope
cat /proc/$(cat /var/run/qemu-server/XYZ.pid)/status | grep PPid
and then using the result of the last command
Code:
cat /proc/PPID/cmdline

Are there any systemd settings you modified?
i see this:

root@pve:~# systemctl status 134.scope
Unit 134.scope could not be found.
root@pve:~# cat /proc/$(cat /var/run/qemu-server/134.pid)/status | grep PPid
PPid: 1
root@pve:~# cat /proc/1/cmdline
/sbin/initroot@pve:~#


No any systemd settings dont modified
 
Last edited:
i see this:

root@pve:~# systemctl status 134.scope
Unit 134.scope could not be found.
root@pve:~# cat /proc/$(cat /var/run/qemu-server/134.pid)/status | grep PPid
PPid: 1
root@pve:~# cat /proc/1/cmdline
/sbin/initroot@pve:~#


No any systemd settings dont modified
@fiona

Do you have any ideas?
 
I do have a patch in the works that would go back to getting the VM ID from the process's commandline rather than the cgroup file, which would be a workaround.

But the real question is why does the cgroup file look like it does in your case. We always run the QEMU command in the qemu.slice/ID.scope systemd scope: https://git.proxmox.com/?p=qemu-ser...88629e1fa1b96f500ab902ccbaffb77;hb=HEAD#l5861
So there has to be a bug either in our code for setting this up or in systemd itself. But I wasn't able to reproduce the issue yet and am still investigating.
 
I do have a patch in the works that would go back to getting the VM ID from the process's commandline rather than the cgroup file, which would be a workaround.

But the real question is why does the cgroup file look like it does in your case. We always run the QEMU command in the qemu.slice/ID.scope systemd scope: https://git.proxmox.com/?p=qemu-ser...88629e1fa1b96f500ab902ccbaffb77;hb=HEAD#l5861
So there has to be a bug either in our code for setting this up or in systemd itself. But I wasn't able to reproduce the issue yet and am still investigating.
@fiona

Can I install proxmox ve7.4?
Maybe it will helps with this bugs?
 
You can't downgrade Proxmox VE installation (or Debian) across major versions. As a workaround, you can use snapshot mode backup. You should install and enable the guest agent for VMs that don't have it yet, so filesystem consistency is not an issue.
 
You can't downgrade Proxmox VE installation (or Debian) across major versions. As a workaround, you can use snapshot mode backup. You should install and enable the guest agent for VMs that don't have it yet, so filesystem consistency is not an issue.
@fiona
But I can make backup my VMs on backup server and after reinstall proxmox version 7.4 and restore my VMs
 
Can you explain me how to use this patch?

maybe I should use special commands, or edit some configs ?
It's intended to be reviewed by other developers and if they deem it acceptable, it will be applied and rolled out in a future version. You could apply and build it yourself, but do so at your own risk.

@fiona

I use Stop mode backup cuz this is full backup and it’s safely for me
snapshot mode backup is also a full backup and as long as the guest agent is installed and enabled, the filesystem status will be consistent too. Of course, there can be special applications that do require even more than that, e.g. databases which can be handled with a hook script.
 
Hi,
did you spawn the VMs manually or via Proxmox VE UI/API/CLI? In the latter case, please share excerpts from the system logs/journal around the time the VM was started (you can search for qmstart:VMID) and from around the time the issue started occurring.

EDIT: Do you have anything on the system that could modify/affect the systemd slices/scopes?

EDIT2: the qmstart task name is followed by the VM ID, but the numbers for the earlier messages are not that, so clarify this.
 
Last edited:
Hi,
did you spawn the VMs manually or via Proxmox VE UI/API/CLI? In the latter case, please share excerpts from the system logs/journal around the time the VM was started (you can search for qmstart:602656 or qmstart:1289723) and from around the time the issue started occurring.

EDIT: Do you have anything on the system that could modify/affect the systemd slices/scopes?
The VMs were created through the web gui interface manually. No changes were made to proxmox. STOP method is used for backups. The console shows that the system hangs at the shutdown stage, but if you use RESUME it will shutdown correctly and start booting. In GUI the icon of this VM is highlighted. In the log tasks:

TASK ERROR: VM quit/powerdown failed
 
except during Backup, is "Stop" button gracefully shutdown VM ?
is Windows VM ?

edit: is backup was OK ?
What about timing, start and stop times VM and start and finish VM backup times.
 
Last edited:
корректно завершает работу виртуальной машины?
yes
is Windows VM ?
Windows (with agent) and Ubuntu
is backup was OK ?
No. After trying to backup as VM hangs (not always) I have to manually press RESUME
INFO: Starting Backup of VM 106 (qemu)
INFO: Backup started at 2025-04-07 02:19:00
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: alkiv-ubuntu
INFO: include disk 'scsi0' 'local:106/vm-106-disk-0.qcow2' 10G
INFO: stopping virtual guest
INFO: VM quit/powerdown failed
ERROR: Backup of VM 106 failed - command 'qm shutdown 106 --skiplock --keepActive --timeout 600' failed: exit code 255
INFO: Failed at 2025-04-07 02:29:00
INFO: Backup job finished with errors
TASK ERROR: job errors

start and stop
start: 15 sec (to desktop)
shutdown: 7 sec
 
Last edited: