"Stop" backup hangs VMs with Nvidia GPU and must reboot the host to fix it

Reguna

Member
Apr 29, 2022
1
0
6
Hi, I have a Nvidia 3090 passed into a Debian 12 VM.
The first time backing up ("stop" mode) this VM (while the VM is running) always works.
However the VM might hang after this, and the noVNC console in the web UI just displays a black screen.
In this state, the VM no longer responds to QEMU agent ping, and I have to forcefully stop the VM.
Other VMs that use the same 3090 will also fail to boot, displaying the same black screen in noVNC.
To fix this, I have to reboot the host.

Any idea on how to diagnose this? This does not occur if I manually reboot the VM. It only happens when I do "stop" mode backup while the VM is running.
Also it is not consistent: the VM might boot normally after the backup. But given enough backup attempts, it will eventually fail.

Edit: I have checked and can confirm that the 3090 has reset capability (/sys/kernel/iommu_groups/11/devices/0000:01:00.0/reset), but the audio controller (in the same IOMMU group) does not have it. Not sure if this is relevant

Edit2: If I shutdown the VM manually before performing the backup, then the VM won't hang after boot. Annoying, but at least it works.
 
Last edited:
Hi,
please share the output of pveversion -v, qm config <ID> the full backup task log where the issue happens and the excerpt of the system journal covering the same time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!