[HELP] GPU Pass-through Broken After 7.2.4 Update

May 21, 2022
6
1
3
Hello everyone,

I have been using GPU pass-through perfectly well for months ever since I first setup this system. I use it for my 3D modeling tasks since my personal laptop can't really handle it. Also gaming. ;)

I just did my first update since I installed a few months ago and my GPU pass-through is no longer working. The card is still getting passed through it seems but now Windows is showing the dreaded code 43 error and I have no idea what to do or where to even begin figuring out what changed.

Here is what I checked so far:
  • I checked that all of my config changes, since I noted everything I did, and all the changes still seem to be there.
  • I also did an update-grub and update-initramfs -u again just in case.
  • I removed and re-added the PCI device.
  • Since there had been some Windows updates I spun a new one from my working template and verified it wasn't a Windows change that caused it.
  • Checked different Display settings to see if that made a difference now compared to what I had before.

I am honestly at a loss and I have no idea where to go from here. I know there were a bunch of updates which included the kernel and I'm not sure if maybe this kernel has something different with it that makes it work differently / requires something additional. But if anyone could give me a hand / set me in the right direction I'd really appreciate it.

Thanks,

Nate
 
First thing you can try is pinning the old kernel to see if that resolves the issue: proxmox-boot-tool kernel pin 5.13.19-6-pve

Can you dump a few things:
  • /etc/default/grub
  • /etc/modules
  • /etc/modprobe.d/blacklist.conf
  • /etc/modprobe.d/vfio.conf
  • the configuration file for your VM
  • the output of the script below
Code:
#!/bin/bash
for d in $(find /sys/kernel/iommu_groups/ -type l | sort -n -k5 -t/); do
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    printf 'IOMMU Group %s ' "$n"
    lspci -nns "${d##*/}"
done;
 
Firstly thank you very much for taking the time to try and help me with this!

/etc/default/grub:
Bash:
root@proxmox:~# cat /etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
# GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on video=vesafb:off video=efifb:off"
GRUB_CMDLINE_LINUX=""

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

/etc/modules:
Bash:
root@proxmox:~# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

/etc/modprobe.d/blacklist.conf:
Bash:
root@proxmox:~# cat /etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist nouveau
blacklist nvidia

/etc/modprobe.d/vfio.conf:
Bash:
root@proxmox:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1f03,10de:10f9 disable_vga=1

VM CONF Example:
Bash:
root@proxmox:/etc/pve/qemu-server# cat 100.conf
agent: 1
audio0: device=ich9-intel-hda,driver=none
bios: ovmf
boot: order=scsi0;net0
cores: 4
cpu: host
efidisk0: local-lvm-fast:vm-100-disk-1,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:06:00,pcie=1,rombar=0,x-vga=1
machine: pc-q35-6.1
memory: 8096
meta: creation-qemu=6.1.0,ctime=1646892218
name: windows-gaming
net0: virtio=4A:CD:C4:B6:AC:A6,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: local-lvm-fast:vm-100-disk-0,cache=writeback,size=120G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=28af771c-ed4d-46d8-948e-c2c49099508c
sockets: 1
tpmstate0: local-lvm-fast:vm-100-disk-2,size=4M,version=v2.0
usb0: host=1-10.4,usb3=1
usb1: host=04b4:4009,usb3=1
vmgenid: c54b94fd-c299-4ec7-b000-f0acabfd179e

What I will say about the VM config is I thought I remembered having to add a long line into it, but I do not see it anymore. And of course I can't find it in my notes but maybe I'm remembering incorrectly back to a time when NVIDIA didn't support it. Maybe the long line was for my OSX VM.

Script Output:

Bash:
root@proxmox:~# sh awesome_script.sh
IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 1 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU Group 2 00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU Group 3 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 4 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 5 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU Group 6 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 7 00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 8 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 9 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU Group 10 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU Group 11 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU Group 12 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
IOMMU Group 12 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 13 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0 [1022:1440]
IOMMU Group 13 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1 [1022:1441]
IOMMU Group 13 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2 [1022:1442]
IOMMU Group 13 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3 [1022:1443]
IOMMU Group 13 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4 [1022:1444]
IOMMU Group 13 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5 [1022:1445]
IOMMU Group 13 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6 [1022:1446]
IOMMU Group 13 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7 [1022:1447]
IOMMU Group 14 01:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation Device [1987:5018] (rev 01)
IOMMU Group 15 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ee]
IOMMU Group 15 02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43eb]
IOMMU Group 15 02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43e9]
IOMMU Group 15 03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 15 03:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 15 04:00.0 Non-Volatile memory controller [0108]: SK hynix Device [1c5c:174a]
IOMMU Group 15 05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
IOMMU Group 16 06:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1f03] (rev a1)
IOMMU Group 16 06:00.1 Audio device [0403]: NVIDIA Corporation TU106 High Definition Audio Controller [10de:10f9] (rev a1)
IOMMU Group 17 07:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU Group 18 08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU Group 19 08:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
IOMMU Group 20 08:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
IOMMU Group 21 08:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
 
Last edited:
OK, everything looks good to me on the configuration side. I'm pretty sure this is just the new kernel's simple framebuffer staying resident in memory. It's been tripping up a lot of folks.

(Sorry about that proxmox-boot-tool kernel pin 5.13.19-6-pve above, it only works with UEFI.)

We can try this one of two ways:
  • There should be an entry in your grub menu at boot to choose the old 5.13.19 kernel. Try that.
  • Disable autostart on your VMs/Containers and capture the output from cat /proc/iomem
 
@nick.kopas

Booting into the older kernel did in fact let everything work again. Thank you for that!

Can this new simple framebuffer that is staying in memory be changed / configured from kernel params to change it's behavior or are we sorta stuck at this point?
 
  • Like
Reactions: rexeus
@nick.kopas

Booting into the older kernel did in fact let everything work again. Thank you for that!

Can this new simple framebuffer that is staying in memory be changed / configured from kernel params to change it's behavior or are we sorta stuck at this point?
This is the workaround the community came up with:
https://forum.proxmox.com/threads/gpu-passthrough-issues-after-upgrade-to-7-2.109051/post-469855

NOTE: You have to modify the script to match the address of your NVIDIA card.
 
Last edited:
nope, just the main one.
but if your problem is solved by pinning the old kernel... i would recommend just doing that.
(i've ran into some other issues with networking and windows guests that are requiring me to pin the old kernel.)
 
Last edited:
@nick.kopas

Ok I have pinned it in grub for now but really we need a more long term solution. Especially if the new kernel is bring on other problems.
I also set up the script snippet just incase I need it in the future. I don't really see it causing any issues itself it's pretty basic.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!