PCIe Graphics Passthrough AMD Process - RTX Card, always driver error code 43

HAIceDragon

New Member
Jul 19, 2023
3
0
1
Hello all,
I have found a lot of information, but it all looks to be about intel CPU's for PCIe passthrough. I have done my best to decipher and cross reference articles to make things line up. And I got it "working" in that it passes through and adds, but it does not start the driver without any errors. I always get error code 43. I would love to know what I am doing wrong and get this resolved.

IOMMU is enabled.
SRV-IO is enabled.
Everything BIOS wise has been enabled that I could find in the ultimate guide and the proxmox walkthrough related to pass through, but if there is more to check... let me know and I will gladly look.

Proxmox is on 8.0.3
MB: MSI B550M-VC Pro wifi
CPU: Ryzen 9 5950x
GPU: EVGA RTX 2060 KO
Willing to add a secondary GPU if that resolves all this, as long as I can put it on one of the slower PCIe slots
- vGPU didn't work for me as it only allowed up to 512Meg of memory, and I didn't see it in the device manager when I did it that way... so I don't think the software was able to use it to decode / re-encode. Trying to use this for BlueIris to reduce CPU load and to eventually enable object detection.

ProxBoxStore:~# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.

# Proxmox Passthrough settings
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

# Chip drivers
nct6775


ProxBoxStore:/etc/modprobe.d# cat blacklist.conf
blacklist radeon
blacklist nouveau
blacklist nvidia


ProxBoxStore:/etc/modprobe.d# cat iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

ProxBoxStore:/etc/modprobe.d# cat pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE

# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
#blacklist nvidia
#blacklist nouveau
#blacklist radeon

ProxBoxStore:/etc/modprobe.d# cat /etc/pve/qemu-server/103.conf
agent: 1
balloon: 0
bios: ovmf
boot: order=sata0;net0;sata1
cores: 24
cpu: host,flags=+ibpb;+virt-ssbd;+amd-ssbd;+pdpe1gb;+aes
efidisk0: NVMeThin:vm-103-disk-1,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:2b:00,pcie=1,romfile=EVGA.RTX2060.6144.200101.rom
hotplug: disk,network,usb,memory,cpu
machine: pc-q35-7.1
memory: 40960
net0: virtio=92:F2:AA:65:D7:38,bridge=vmbr1001
net1: virtio=2A:FF:8F:00:D9:B2,bridge=vmbr9999
numa: 1
onboot: 1
ostype: win11
sata0: NVMeThin:vm-103-disk-0,discard=on,size=128G,ssd=1
sata1: none,media=cdrom
scsi10: SmallStore:vm-103-disk-10,discard=on,size=6T,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=ad54428d-8e7d-4e6e-a793-3804d380054b
sockets: 1
spice_enhancements: foldersharing=1,videostreaming=all
startup: order=2,up=60
tpmstate0: NVMeThin:vm-103-disk-2,size=4M,version=v2.0
vmgenid: e3315a97-efbc-4b3d-92a7-a4469cd5ceee


I confirmed the PCI information following the guides, and VM realizes it is an RTX2060 on it's own... but doesn't want to start the driver. In older versions, I just had a start up task that disabled the driver then re-enabled it and that seemed to work. But now that is no longer working.

ProxBoxStore:/etc/modprobe.d# dmesg | grep "remapping"
[ 0.731265] AMD-Vi: Interrupt remapping enabled

ProxBoxStore:/etc/modprobe.d# dmesg | grep -e IOMMU
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.730432] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.731262] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.740729] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

I also did the the /etc/modules piece and did the update-initramfs -u -k all

root@ProxBoxStore:/etc/modprobe.d# dmesg | grep -i vfio
[ 9.756603] VFIO - User Level meta-driver version: 0.3
[ 9.763737] vfio-pci 0000:2b:00.0: vgaarb: deactivate vga console
[ 9.763739] vfio-pci 0000:2b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:eek:wns=none
[ 9.763852] vfio_pci: add [10de:1e89[ffffffff:ffffffff]] class 0x000000/00000000
[ 9.812269] vfio_pci: add [10de:10f8[ffffffff:ffffffff]] class 0x000000/00000000
[ 9.812282] vfio_pci: add [10de:1ad8[ffffffff:ffffffff]] class 0x000000/00000000
[ 9.812287] vfio_pci: add [10de:1ad9[ffffffff:ffffffff]] class 0x000000/00000000
[ 83.877753] vfio-pci 0000:2b:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[ 83.877783] vfio-pci 0000:2b:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 83.901589] vfio-pci 0000:2b:00.1: enabling device (0000 -> 0002)

ProxBoxStore:/etc/modprobe.d# cat /etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
# info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt nox2apic intremap=no_x2apic_optout pcie_acs_override=downstream,multifunction video=vesafb:eek:f video=efifb:eek:off"
#GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""

# If your computer has multiple operating systems installed, then you
# probably want to run os-prober. However, if your computer is a host
# for guest OSes installed via LVM or raw disk devices, running
# os-prober can cause damage to those guest OSes as it mounts
# filesystems to look for things.
#GRUB_DISABLE_OS_PROBER=false

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"


ProxBoxStore:/etc/modprobe.d# dmesg | grep -E "DMAR|IOMMU"
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.730432] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.731262] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.740729] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).


ProxBoxStore:/etc/modprobe.d# lspci -nnk | grep 'NVIDIA'
2b:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2060] [10de:1e89] (rev a1)
2b:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8] (rev a1)
2b:00.2 USB controller [0c03]: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1ad8] (rev a1)
2b:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller [10de:1ad9] (rev a1)

Any more info that may help, please let me know and I will gladly grab it. Like I said, I just want to resolve this.
 
I've done pass-through with Nvidia, Intel and AMD, but mostly with Nvidia, because that's what I use for CUDA.
And generally, error 43 is mostly an indication that you've reached the point where things should be working, because you've overcome the technical challenges....

...and now Nvidia decides that you should not use youre hardware in that manner.

However, I was under the impression that they had given up on blocking deployment of their drivers when they discover it's running inside a VM.
I did have that error with one system recently, I believe a Phantom Canyon Intel NUC11 with an RTX2060 in it.

There the way to avoid the issue was to use a different driver. It could have been using the desktop variant instead of the mobile, or it could have been using the one from the CUDA package...

There are ways to play tricks on the driver using the KVM configuration, but Proxmox might generate that on the fly so those tricks won't work (was the case in oVirt). But these days it should no longer be necessary (of course Nvidia could change its mind any time).
 
If I uninstall the device in device manager, then let it reinstall each time after boot... it starts right up fine.
So it looks like @abufrejoval is on point with it. but I am not aware of cuda drivers for a windows machine and rtx 2060. Found them for linux though.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!