VM GPU Passthrough Proxmox Crash on Shutdown

gtk1

New Member
May 10, 2025
2
0
1
I've successfully passed though the GPU to Ubuntu through a few logs, but I'm having issues when shutting down the VM. I can reboot just fine, whether I reboot Proxmox or Ubuntu, and doesn't matter if I reboot Ubuntu through Proxmox or in the VM, it works. It's only shutdown. After about 5 seconds post shutdown, the Proxmox UI hangs and then connection is lost and isn't restored. I've looked and looked at all the configs, most of what I see is that initcall_blacklist=sysfb_init is most imporant, and the video off's don't matter, and it works for most, but I've tried everything I can find and think of. I'm assuming it's GPU but I'm new to Proxmox so, no really sure tbh.

Specs:
Intel i7-14700K
ASUS TUF Z790-Plus WiFi
RTX 5070ti

Code:
# If you change this file, run 'update-grub' afterwards to update

# /boot/grub/grub.cfg.

# For full documentation of the options in this file, see:

#   info -f grub -n 'Simple configuration'



GRUB_DEFAULT=0

GRUB_TIMEOUT=5

GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`

#GRUB_CMDLINE_LINUX_DEFAULT="quiet"

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction pci=realloc vfio-pci.ids=10de:2c05,10de:22e9 initcall_blacklist=sysfb_init video=vesafb:off video=efifb:off video=simplefb:off"

GRUB_CMDLINE_LINUX=""



# If your computer has multiple operating systems installed, then you

# probably want to run os-prober. However, if your computer is a host

# for guest OSes installed via LVM or raw disk devices, running

# os-prober can cause damage to those guest OSes as it mounts

# filesystems to look for things.

#GRUB_DISABLE_OS_PROBER=false



# Uncomment to enable BadRAM filtering, modify to suit your needs

# This works with Linux (no patch required) and with any kernel that obtains

# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)

#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"



# Uncomment to disable graphical terminal

#GRUB_TERMINAL=console



# The resolution used on graphical terminal

# note that you can use only modes which your graphic card supports via VBE

# you can see them in real GRUB with the command `vbeinfo'

#GRUB_GFXMODE=640x480



# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux

#GRUB_DISABLE_LINUX_UUID=true



# Uncomment to disable generation of recovery mode menu entries

#GRUB_DISABLE_RECOVERY="true"



# Uncomment to get a beep at grub start

#GRUB_INIT_TUNE="480 440 1"

Code:
blacklist nouveau

blacklist nvidia

blacklist nvidiafb

Code:
options vfio-pci ids=10de:2c05,10de:22e9 disable_vga=1 disable_idle_d3=1
#options vfio-pci ids=8086:a780

Code:
# /etc/modules: kernel modules to load at boot time.

#

# This file contains the names of kernel modules that should be loaded

# at boot time, one per line. Lines beginning with "#" are ignored.

# Parameters can be specified after the module name.

vfio

vfio_iommu_type1

vfio_pci

vfio_virqfd

Code:
agent: 1

balloon: 0

bios: ovmf

boot: order=virtio0;ide2;net0

cores: 22

cpu: host

efidisk0: fast-vm-pool:100/vm-100-disk-0.raw,efitype=4m,size=528K

hostpci0: 01:00.0,pcie=1,rombar=0

hostpci1: 01:00.1,pcie=1

ide2: none,media=cdrom

machine: q35

memory: 81920

meta: creation-qemu=9.2.0,ctime=1745644353

name: clara-core

net0: virtio=BC:24:11:36:17:B5,bridge=vmbr0,firewall=1

numa: 0

ostype: l26

scsi0: docker-apps:vm-100-disk-0,backup=0,discard=on,iothread=1,size=1800G,ssd=1

scsihw: virtio-scsi-single

smbios1: uuid=3b3d02f8-34ae-46dc-b5e1-77a937aa3c03

sockets: 1

unused0: local-zfs:vm-100-disk-0

unused1: local-zfs:vm-100-disk-1

virtio0: fast-vm-pool:100/vm-100-disk-1.qcow2,cache=writeback,discard=on,iothread=1,size=1500G

vmgenid: 742d1006-8e92-4db0-8924-d521581ce720

#hookscript: local:snippets/100-prestop-hook.sh

hookscript: local:snippets/100-poststop-hook.sh

args: -rtc base=localtime,clock=host -no-shutdown

Code:
#!/bin/bash



if [[ "$2" == "post-stop" ]]; then

  echo " Cleaning up GPU passthrough remnants"



  echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove || true

  echo 1 > /sys/bus/pci/devices/0000:01:00.1/remove || true

  sleep 1

  echo 1 > /sys/bus/pci/rescan



  echo 0 > /sys/class/vtconsole/vtcon0/bind || true

  echo 0 > /sys/class/vtconsole/vtcon1/bind || true



  echo simple-framebuffer.0 > /sys/bus/platform/drivers/simple-framebuffer/unbind || true

fi

Code:
initrd=\EFI\proxmox\6.8.12-10-pve\initrd.img-6.8.12-10-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction pci=realloc vfio-pci.ids=10de:2c05,10de:22e9 initcall_blacklist=sysfb_init video=vesafb:off video=efifb:off video=simplefb:off kvm.ignore_msrs=1 kvm.report_ignored_msrs=0 split_lock_detect=off

Code:
IOMMU group 0 00:02.0 VGA compatible controller [0300]: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] [8086:a780] (rev 04)

IOMMU group 10 00:17.0 SATA controller [0106]: Intel Corporation Raptor Lake SATA AHCI Controller [8086:7a62] (rev 11)

IOMMU group 11 00:1a.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express Root Port [8086:7a48] (rev 11)

IOMMU group 12 00:1b.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express Root Port [8086:7a40] (rev 11)

IOMMU group 13 00:1b.4 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express Root Port [8086:7a44] (rev 11)

IOMMU group 14 00:1c.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express Root Port [8086:7a38] (rev 11)

IOMMU group 15 00:1c.2 PCI bridge [0604]: Intel Corporation Raptor Point-S PCH - PCI Express Root Port 3 [8086:7a3a] (rev 11)

IOMMU group 16 00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:7a3c] (rev 11)

IOMMU group 17 00:1d.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express Root Port [8086:7a30] (rev 11)

IOMMU group 18 00:1f.0 ISA bridge [0601]: Intel Corporation Raptor Lake LPC/eSPI Controller [8086:7a04] (rev 11)

IOMMU group 18 00:1f.3 Audio device [0403]: Intel Corporation Raptor Lake High Definition Audio Controller [8086:7a50] (rev 11)

IOMMU group 18 00:1f.4 SMBus [0c05]: Intel Corporation Raptor Lake-S PCH SMBus Controller [8086:7a23] (rev 11)

IOMMU group 18 00:1f.5 Serial bus controller [0c80]: Intel Corporation Raptor Lake SPI (flash) Controller [8086:7a24] (rev 11)

IOMMU group 19 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2c05] (rev a1)

IOMMU group 19 01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22e9] (rev a1)

IOMMU group 1 00:00.0 Host bridge [0600]: Intel Corporation Raptor Lake-S 8+12 - Host Bridge/DRAM Controller [8086:a740] (rev 01)

IOMMU group 20 02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9C1a (DRAM-less) [144d:a80d]

IOMMU group 21 05:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation PS5027-E27T PCIe4 NVMe Controller (DRAM-less) [1987:5027] (rev 01)

IOMMU group 22 07:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I226-V [8086:125c] (rev 06)

IOMMU group 23 08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)

IOMMU group 2 00:01.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express 5.0 Graphics Port (PEG010) [8086:a70d] (rev 01)

IOMMU group 3 00:06.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCIe 4.0 Graphics Port [8086:a74d] (rev 01)

IOMMU group 4 00:0a.0 Signal processing controller [1180]: Intel Corporation Raptor Lake Crashlog and Telemetry [8086:a77d] (rev 01)

IOMMU group 5 00:0e.0 RAID bus controller [0104]: Intel Corporation Volume Management Device NVMe RAID Controller Intel Corporation [8086:a77f]

IOMMU group 6 00:14.0 USB controller [0c03]: Intel Corporation Raptor Lake USB 3.2 Gen 2x2 (20 Gb/s) XHCI Host Controller [8086:7a60] (rev 11)

IOMMU group 6 00:14.2 RAM memory [0500]: Intel Corporation Raptor Lake-S PCH Shared SRAM [8086:7a27] (rev 11)

IOMMU group 7 00:14.3 Network controller [0280]: Intel Corporation Raptor Lake-S PCH CNVi WiFi [8086:7a70] (rev 11)

IOMMU group 8 00:15.0 Serial bus controller [0c80]: Intel Corporation Raptor Lake Serial IO I2C Host Controller [8086:7a4c] (rev 11)

IOMMU group 8 00:15.1 Serial bus controller [0c80]: Intel Corporation Raptor Lake Serial IO I2C Host Controller [8086:7a4d] (rev 11)

IOMMU group 8 00:15.2 Serial bus controller [0c80]: Intel Corporation Raptor Lake Serial IO I2C Host Controller [8086:7a4e] (rev 11)

IOMMU group 9 00:16.0 Communication controller [0780]: Intel Corporation Raptor Lake CSME HECI [8086:7a68] (rev 11)

Code:
00:00.0 Host bridge [0600]: Intel Corporation Raptor Lake-S 8+12 - Host Bridge/DRAM Controller [8086:a740] (rev 01)

00:01.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express 5.0 Graphics Port (PEG010) [8086:a70d] (rev 01)

00:02.0 VGA compatible controller [0300]: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] [8086:a780] (rev 04)

00:06.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCIe 4.0 Graphics Port [8086:a74d] (rev 01)

00:0a.0 Signal processing controller [1180]: Intel Corporation Raptor Lake Crashlog and Telemetry [8086:a77d] (rev 01)

00:0e.0 RAID bus controller [0104]: Intel Corporation Volume Management Device NVMe RAID Controller Intel Corporation [8086:a77f]

00:14.0 USB controller [0c03]: Intel Corporation Raptor Lake USB 3.2 Gen 2x2 (20 Gb/s) XHCI Host Controller [8086:7a60] (rev 11)

00:14.2 RAM memory [0500]: Intel Corporation Raptor Lake-S PCH Shared SRAM [8086:7a27] (rev 11)

00:14.3 Network controller [0280]: Intel Corporation Raptor Lake-S PCH CNVi WiFi [8086:7a70] (rev 11)

00:15.0 Serial bus controller [0c80]: Intel Corporation Raptor Lake Serial IO I2C Host Controller [8086:7a4c] (rev 11)

00:15.1 Serial bus controller [0c80]: Intel Corporation Raptor Lake Serial IO I2C Host Controller [8086:7a4d] (rev 11)

00:15.2 Serial bus controller [0c80]: Intel Corporation Raptor Lake Serial IO I2C Host Controller [8086:7a4e] (rev 11)

00:16.0 Communication controller [0780]: Intel Corporation Raptor Lake CSME HECI [8086:7a68] (rev 11)

00:17.0 SATA controller [0106]: Intel Corporation Raptor Lake SATA AHCI Controller [8086:7a62] (rev 11)

00:1a.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express Root Port [8086:7a48] (rev 11)

00:1b.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express Root Port [8086:7a40] (rev 11)

00:1b.4 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express Root Port [8086:7a44] (rev 11)

00:1c.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express Root Port [8086:7a38] (rev 11)

00:1c.2 PCI bridge [0604]: Intel Corporation Raptor Point-S PCH - PCI Express Root Port 3 [8086:7a3a] (rev 11)

00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:7a3c] (rev 11)

00:1d.0 PCI bridge [0604]: Intel Corporation Raptor Lake PCI Express Root Port [8086:7a30] (rev 11)

00:1f.0 ISA bridge [0601]: Intel Corporation Raptor Lake LPC/eSPI Controller [8086:7a04] (rev 11)

00:1f.3 Audio device [0403]: Intel Corporation Raptor Lake High Definition Audio Controller [8086:7a50] (rev 11)

00:1f.4 SMBus [0c05]: Intel Corporation Raptor Lake-S PCH SMBus Controller [8086:7a23] (rev 11)

00:1f.5 Serial bus controller [0c80]: Intel Corporation Raptor Lake SPI (flash) Controller [8086:7a24] (rev 11)

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2c05] (rev a1)

01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22e9] (rev a1)

02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9C1a (DRAM-less) [144d:a80d]

05:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation PS5027-E27T PCIe4 NVMe Controller (DRAM-less) [1987:5027] (rev 01)

07:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I226-V [8086:125c] (rev 06)

08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)

edit:
didn't know there was a difference in all these old configs, I'm using UEFI with systemd-boot, not Grub, apparently?

Code:
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0000,0001
Boot0000* Linux Boot Manager    HD(2,GPT,b25eb29f-d367-44af-a17e-a44b30f4aeed,0x800,0x200000)/File(\EFI\systemd\systemd-bootx64.efi)
Boot0001* UEFI OS       HD(2,GPT,b25eb29f-d367-44af-a17e-a44b30f4aeed,0x800,0x200000)/File(\EFI\BOOT\BOOTX64.EFI)..BO

Code:
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi
6E59-BE4B is configured with: uefi (versions: 6.8.12-10-pve, 6.8.12-9-pve)

Code:
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction pci=realloc vfio-pci.ids=10de:2c05,10de:22e9 initcall_blacklist=sysfb_init video=vesafb:off video=efifb:off video=simplefb:off kvm.ignore_msrs=1 kvm.report_ignored_msrs=0 split_lock_detect=off
 
Last edited:
So, while this is waiting admin approval, I'm probably just going to bail proxmox, but it would be interesting to know the issue. I ended up having these errors as well:


Code:
vfio_container_dma_map(0x5acadf6f7280, 0x380000000000, 0x400000000, 0x6ff280000000) = -22 (Invalid argument)

a lot of them, like crazy. I don't have a huge need to run more than ubuntu, so I'm just moving to straight ubuntu install.

Again if anyone doesn't mind pointing out an issue if it's obvious, neato and ty!
 
From what you talked, I can guess it's related to the GPU passback to the host. When reboot VM, the GPU won't be passed back to the host.

So Can you try the following to minimize the issue. Try pass the whole GPU to the VM together?

e.g. In your current VM config

pass hostpci0: 01:00
instead of hostpci0: 01:00.0 + hostpci1 01:00:1 separately.

I believe in the GUI, pick the GPU, tick pcie, tick All functions.
If the VM works ok, try return to the host with 1 item and see if it works ok.
 
Last edited:
args: ...... -no-shutdown
I don't know what this args value is or what it does.


EDIT: Found this on QEMU man page:
-no-shutdown
Don’t exit QEMU on guest shutdown, but instead only stop the emulation. This allows for instance switching to monitor to commit changes to the disk image.
I'm still not sure why you have that in your VM config or how it got there.
 
Last edited:
Hi all, just following up. This may be an issue with the latest nvidia drivers.

I caught Proxmox not crashing if the drivers weren't installed on the VM or the card wasn't being used prior to shutdown.
These were the changes I made to the VM and it hasn't crashed so far. (24-48 hours)

Original:
Code:
ubuntu-24.04-live-server-amd64.iso (installer)
nvidia-open (driver, defaults to 575 with dkms)
cuda-toolkit-12-8 (package: required for container toolkit)
nvidia-container-toolkit (package: nvidia container support)

Current:
Code:
ubuntu-24.04-server-cloudimg-amd64.img (installer)
nvidia-headless-no-dkms-570-server-open (driver)
nvidia-utils-570-server
libnvidia-decode-570-server (package: hardware accelerated video decoding)
libnvidia-encode-570-server (package: hardware accelerated video encoding)
cuda-toolkit-12-9 (package: required for container toolkit)
nvidia-container-toolkit  (package: nvidia container support)

These are my Proxmox (8.2.3) host changes:

/etc/default/grub
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_port_pm=off pcie_aspm.policy=performance intel_iommu=on iommu=pt split_lock_detect=off"

/etc/modprobe.d/blacklist.conf
Code:
blacklist nouveau
blacklist nvidia*

/etc/modprobe.d/vfio.conf
Code:
options vfio-pci ids=REDACTED,REDACTED
options vfio-pci disable_idle_d3=1
softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidiafb pre: vfio-pci
softdep nvidia_drm pre: vfio-pci
softdep drm pre: vfio-pci

/etc/pve/qemu-server/103.conf
Code:
bios: ovmf
cpu: x86-64-v2-AES
efidisk0: storage01:103/vm-103-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:01:00,pcie=1,rombar=0
machine: q35
 
Last edited:
Hello all,
I had hope to create an account to say thanks to Jadorno, but those "thanks" will be on hold for the moment while I pull out what hair I have left.

With the help of ChatGpt, I copied nearly everything you posted, and I still get a return of "no devices found" with the nvidia-smi command.

I even tried the ROM trick by downloading the same ROM from Techpowerup and letting the VM have it. Nothing.

I have tried several versions/arguments of this line of the VM Config , and host grub settings.


The VM keeps returning this ..

kyle@ubuntullm:~$ sudo cat /proc/driver/nvidia/gpus/0000:01:00.0/information
[sudo] password for kyle:
Model: Unknown
IRQ: 16
GPU UUID: GPU-????????-????-????-????-????????????
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 52 bits
DMA Mask: 0xfffffffffffff
Bus Location: 0000:01:00.0
Device Minor: 0
GPU Firmware: N/A
GPU Excluded: No
kyle@ubuntullm:~$

My hardware/software if it matters for those who like puzzles.

Host
Xeon(R) CPU E5-2695 v4
PNY 5070
Asus Rampage V Extreme X99 platform ( https://www.anandtech.com/show/9278/the-asus-x99-rampage-v-extreme-rog-review )
Proxmox Version - pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.8.12-10-pve)


VM
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy


The proxmox system log has some errors. Chat GPT says this about them,

PCIe AER (Advanced Error Reporting) is throwing “Correctable Timeout” errors for the GPU's VFIO interface, specifically:
vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID)
...
[12] Timeout

and..

Summary: This Log Confirms​


✅ You're hitting a firmware-level or microcontroller-level failure during PCIe linkup in passthrough mode.


Any ideas at all would be appreciated.
 
Last edited:
Follow up - Working I think,

I did not have 4G Decoding enabled in BIOS, and my add-in 4 port nic was apparently in a slot that conflicted with either the NVME drive or the GPU depending on which slot the GPU was in. I ended up moving the 5070 to the 2nd x16 slot, and the nic to another slot that would not have an address conflict. We shall see if it continues to show up with ~

kyle@ubuntullm:~$ nvidia-smi
Thu May 22 19:57:31 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16 Driver Version: 570.86.16 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA Graphics Device Off | 00000000:01:00.0 Off | N/A |
| 40% 31C P0 19W / 250W | 1MiB / 12227MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+