Another GPU Passthrough Challenge...

voyager529

New Member
Apr 5, 2024
11
5
3
Hello friends!

So, I know that GPU passthrough can be done successfully - I'm writing this post on a VM with an Intel Arc A580 passed through, so I know that GPU passthrough can definitely be done.

I'm attempting to build an additional PVE node running Proxmox 9, which includes an nVidia GTX 3060. It is on an AsRock motherboard (A Phantom B550, AMD Ryzen 5 4500), with SR-IOV and IOMMU enabled. The GPU is intended to be passed through to a machine running Debian 12 (DietPi) in order to handle compute tasks for Immich.

When I attempt to start the Immich VM, the host locks up - I can't even ping the host until I hit the reset button.

Things I've tried already...

Code:
root@router:~# cat /etc/modprobe.d/blacklist-nvidia.conf
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist rivafb
blacklist nvidia_uvm
blacklist nvidia_drm
blacklist snd_hda_intel
options nouveau modeset=0
root@router:~#

Code:
root@router:~# lspci -nnk -d 10de:
04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] [10de:2504] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:397d]
        Kernel modules: nvidiafb, nouveau
04:00.1 Audio device [0403]: NVIDIA Corporation GA106 High Definition Audio Controller [10de:228e] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:397d]
        Kernel modules: snd_hda_intel

I've tried EVERY combination of "All Functions", "ROM-Bar", "PCI-Express", and "Primary GPU". I even bought a replacement motherboard, and it does the exact same thing.

Now, the GPU *does* work; I ran Immich on bare metal before virtualizing it. I also know that PCIe Passthrough works on this host in some capacity; the eagle-eyed readers here will note that the host is named 'router', because its primary purpose is to run NethSecurity as a router, performing its I/O through a quad-port Broadcom NIC, which is passed through to a different VM, and that VM can start up just fine.

So...I'm appreciative of any assistance that can be provided in helping me figure out why passing the GPU through to the Immich VM causes it to lock up. I'm agnostic about whether the VM does an actual video output on the card, as long as it works.

Thank you for reading!
 
Passing the host gpu as "primary gpu" will not work.. My rtx 3060 doesn't need modeset=0 on intel 12th gen. It did require all functions and the one to right BUT NOT primary gpu.. Did you blacklist both :2504 and :228e in vfio.conf?

lspci -nnk | grep -A 3 'VGA'
Kernel module: vfio-pci for both?
 
Did you check your IOMMU groups: https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_isolation ? A B550 motherboard can only passthrough the first x16 PCIe slot (and the first M.2 slot). All other devices are in one big chipset group and will cause the Proxmox host to lose all device when you passthrough any device from that group, which matches the "host locks up".

This tracks perfectly; the NIC is in the first slot, which is probably why it passes through fine, but the GPU doesn't. I'd like to pass through both PCIe slots, but I think you may be on to something here, in that the GPU needs to be passed through directly, but I can play games with NIC bridges to allow the router to route without doing PCIe passthrough. I may go down that road. Thank you for this.


Passing the host gpu as "primary gpu" will not work.. My rtx 3060 doesn't need modeset=0 on intel 12th gen. It did require all functions and the one to right BUT NOT primary gpu.. Did you blacklist both :2504 and :228e in vfio.conf?

lspci -nnk | grep -A 3 'VGA'
Kernel module: vfio-pci for both?

Code:
root@router:~# lspci -nnk | grep -A 3 'VGA'
04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] [10de:2504] (rev a1)
    Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:397d]
    Kernel modules: nvidiafb, nouveau
04:00.1 Audio device [0403]: NVIDIA Corporation GA106 High Definition Audio Controller [10de:228e] (rev a1)

That part I found interesting - it stopped LOADING the nouveau modules, but it never loaded the vfio-pci modules instead. Weird...

Code:
root@router:~# cat /etc/modprobe.d/vfio.conf
# options vfio-pci ids=10de:2504,10de:228e
options vfio-pci ids=10de:2504,10de:228e disable_vga=1 disable_idle_d3=1

There were two attempts here, as can be seen...but yes, I did disable both of them, ultimately.
 
So, just to tie a bow on this thread for the next traveler who may come across it...

I had pretty much everything set up the way it was supposed to - the nVidia drivers were blocked from loading, and the passthrough settings were correct, but the particular motherboard I was working with was set so that the only PCIe slot that wasn't part of the chipset was the 'GPU slot'. Now, as a rule, I actually appreciate this - it's been a long-standing headache when motherboards come with slots that get automatically disabled based on what else is plugged into *other* slots - but in this case, Proxmox is understandably a bit more picky.

So, I took the provided advice and moved the GPU to the dedicated slot, then moved my NIC into the shared slot, and that got the Immich VM to load up, complete with the GPU processing.

When I tried doing PCIe passthrough on the NIC, the Proxmox host did the same thing. I decided that it'd be viable to do my router tasks using a series of bridges, rather than PCIe passthrough, so I can get everything working the way I want.

Thank you everyone for your help!!