Environment:
After a Proxmox OS update followed by a clean OS shutdown and restart, starting a Windows VM with the RTX 5070 Ti in PCI passthrough causes the Proxmox host node to become completely unreachable. Web UI (port 8006), SSH, and Datacenter Manager all stop responding simultaneously. The host hardware remains powered on (IPMI/BMC accessible) but the OS is locked up. Removing the
This configuration had been running stably for approximately 6 months prior to the Proxmox update.
Diagnosis:
VFIO binding was confirmed correct before investigating further — both GPU functions were already bound to vfio-pci at host boot:
Checking installed kernels revealed the update had pulled in 7.0.6-2-pve (currently running) as well as the 6.17.x series, with 6.14.8-2-pve still installed.
The key indicator is
Fix:
Rolling the boot kernel back to 6.14.8-2-pve fully resolves the issue:
After reboot,
Request:
Can anyone confirm whether this regression is present on 6.17.x kernels as well, or only 7.0.x?
We have not tested the 6.17 series.
Also interested to know if this affects other Blackwell SKUs (RTX 5090, in particular) or only GB203.
- Proxmox VE 9.2.3, qemu-server 9.1.16
- Host CPU: AMD (600 Series chipset, PCIe switch confirmed in lspci)
- GPU: NVIDIA GeForce RTX 5070 Ti — GB203 — PCI ID 10de:2c05 (MSI variant 1462:5310)
- GPU Audio: 10de:22e9
- IOMMU Group 13 — GPU and audio isolated, no other devices sharing the group
- VM hostpci config:
hostpci0: 0000:01:00,pcie=1,x-vga=1 - Single GPU in host (no iGPU, no secondary display adapter)
After a Proxmox OS update followed by a clean OS shutdown and restart, starting a Windows VM with the RTX 5070 Ti in PCI passthrough causes the Proxmox host node to become completely unreachable. Web UI (port 8006), SSH, and Datacenter Manager all stop responding simultaneously. The host hardware remains powered on (IPMI/BMC accessible) but the OS is locked up. Removing the
hostpci0 line from the VM config and restarting the VM immediately restores normal node operation. Re-adding the PCI device and restarting reproduces the hang every time.This configuration had been running stably for approximately 6 months prior to the Proxmox update.
Diagnosis:
VFIO binding was confirmed correct before investigating further — both GPU functions were already bound to vfio-pci at host boot:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GB203 [GeForce RTX 5070 Ti] [10de:2c05] (rev a1) Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau, nova_core01:00.1 Audio device [0403]: NVIDIA Corporation GB203 High Definition Audio [10de:22e9] (rev a1) Kernel driver in use: vfio-pci Kernel modules: snd_hda_intelChecking installed kernels revealed the update had pulled in 7.0.6-2-pve (currently running) as well as the 6.17.x series, with 6.14.8-2-pve still installed.
The key indicator is
nova_core appearing in the kernel modules list on 7.0.6-2-pve. This module — introduced for Blackwell-architecture NVIDIA GPUs in newer kernels — is absent on 6.14.8-2-pve. The interaction between nova_core and the VFIO subsystem at VM start time appears to cause a PCIe bus hang that locks the host OS entirely.Fix:
Rolling the boot kernel back to 6.14.8-2-pve fully resolves the issue:
proxmox-boot-tool kernel pin 6.14.8-2-pveproxmox-boot-tool refreshrebootAfter reboot,
nova_core no longer appears in the GPU's kernel modules list, and GPU passthrough functions normally. The Proxmox management stack, QEMU, and all other packages remain on their updated versions — only the kernel changes.# Confirmed working state:uname -r → 6.14.8-2-pvelspci kernel modules for 10de:2c05 → nvidiafb, nouveau (no nova_core)Kernel driver in use → vfio-pci (unchanged, correct)Request:
Can anyone confirm whether this regression is present on 6.17.x kernels as well, or only 7.0.x?
We have not tested the 6.17 series.
Also interested to know if this affects other Blackwell SKUs (RTX 5090, in particular) or only GB203.