[SOLVED] GPU Passthrough issue

localcc

Member
May 5, 2022
10
4
8
I've been having this GPU passthrough issue with HP Proliant dl380 g9. The GPU is passed through to vm, but after installing drivers it either bootloops or freezes the guest. Both q35 and i440fx chipsets were tried. CPU is set in host mode. PSU's give sufficient power, server only consumes 250w out of 1000w possible. EFI dumps from TechPowerUp and dumps from the GPU were edited to cut out the extra part, didn't work either. GPU has a display connected to it at all times, VM's display is set to none. All Funcions, Rom Bar, Primary GPU and Pci-E(when using q35) were checked too.
BIOS has VT-D and IOMMU enabled. The GPU is the only GPU in the server.
Passthrough was tried on Proxmox VE 7.1-7 and 7.2-3

/etc/default/grub
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on video=efifb:off video=efifb:eek:ff"

/etc/modprobe.d/vfio.conf
Code:
options vfio-pci ids=10de:1380,10de:0fbc disable_vga=1

/etc/modprobe.d/iommu_unsafe_interrupts.conf
Code:
options vfio_iommu_type1 allow_unsafe_interrupts=1

/etc/modprobe.d/pve-blacklist.conf
Code:
blacklist nvidiafb
blacklist nvidia
blacklist nouveau
blacklist radeon

/etc/modules
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Code:
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] [10de:1380] (rev a2) (prog-if 00 [VGA controller])
        Subsystem: Micro-Star International Co., Ltd. [MSI] GM107 [GeForce GTX 750 Ti] [1462:8a9b]
        Physical Slot: 5
        Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 1, IOMMU group 74
        Memory at c8000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3bfe0000000 (64-bit, prefetchable) [size=256M]
        Memory at 3bff0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at a000 [size=128]
        Expansion ROM at c9080000 [virtual] [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau

84:00.1 Audio device [0403]: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] [10de:0fbc] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] GM107 High Definition Audio Controller [GeForce 940MX] [1462:8a9b]
        Physical Slot: 5
        Flags: bus master, fast devsel, latency 0, IRQ 17, NUMA node 1, IOMMU group 74
        Memory at c9000000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel
 
Removing efifb flags didn't create an efi-framebuffer device, also the boot gpu is set to servers IGPU so it shouldn't interfere with the passthrough.
If the boot GPU is the integrated graphics then video=efifb:off video=efifb:eek:ff" should not be necessary (and the second one is invalid anyway). Do you see BOOTFB in cat /proc/iomem? Do you have BAR cannot reserve memory errors in your Syslog when starting the VM?
Do you see the boot messages and Proxmox host console on the integrated graphics? Sometimes a monitor needs to be plugged in for integrated graphics to be actually used during boot.
 
If the boot GPU is the integrated graphics then video=efifb:off video=efifb:eek:ff" should not be necessary (and the second one is invalid anyway). Do you see BOOTFB in cat /proc/iomem? Do you have BAR cannot reserve memory errors in your Syslog when starting the VM?
Do you see the boot messages and Proxmox host console on the integrated graphics? Sometimes a monitor needs to be plugged in for integrated graphics to be actually used during boot.
There are no BAR cannot reserve memory errors in dmesg. I can see the boot messages and Proxmox host console on the igpu. And /proc/iomem doesn't have any entries with BOOTFB
 
  • Like
Reactions: leesteken
It booted in i440fx driver got installed, but I got code 43, tried to enable Message Signaled Interrupts, still code 43, after that switched to q35 and after the driver loaded the guest froze.
 
Pining older kernel fixed issue for me
Bash:
proxmox-boot-tool kernel pin 5.13.19-6-pv
 
Let me get this straight. Passthrough works in principle because the VM sees the GPU? Does it work (output on physical monitor) without installing any device-specific drivers? When you install drivers, only the VM has issues (no error logs on the host)?
Is the GPU used by the Proxmox host BIOS/UEFI boot/POST? Maybe the devices does not reset into a state that the drivers can handle. Can you try booting the host using another GPU? What OS is used inside the VM? Who provides the drivers?
Can you try booting the VM from a Ubuntu 22.04 installer and see if that boots and display output on a physical monitor?
 
Let me get this straight. Passthrough works in principle because the VM sees the GPU? Does it work (output on physical monitor) without installing any device-specific drivers? When you install drivers, only the VM has issues (no error logs on the host)?
Is the GPU used by the Proxmox host BIOS/UEFI boot/POST? Maybe the devices does not reset into a state that the drivers can handle. Can you try booting the host using another GPU? What OS is used inside the VM? Who provides the drivers?
Can you try booting the VM from a Ubuntu 22.04 installer and see if that boots and display output on a physical monitor?
VM sees the gpu and output on physical monitor is there until the driver is installed, after that the screen goes either blank or displays artifacts. When the drivers are installed there are no error logs on host. GPU should not be used by proxmox or any other UEFI stuff as the server is set to use integrated gpu in bios. OS inside the vm is Win10 21H2 Build 19044.1288. Drivers are 512.59 from NVIDIA official website. In ubuntu 22.04 installer the gpu is displaying an image and I can install the os successfully. In the Ubuntu 22.04 itself the gpu has display with nouveau drivers, checked with lspci -nnv. After installing nvidia drivers and rebooting, the gpu is using nvidia driver and the image is displayed correctly.
 
Sounds like Proxmox PCI passthrough works fine in all configurations but one. I have no idea how to fix or work-around NVidia driver issues (with passthrough) on Windows 10, sorry. Hopefully someone else knows some ttricks for this.

EDIT: Have you tried various (older) machine versions of q35, especially 6.1 and 5.2? They change the virtual PCIe layout and maybe the driver expects a certain layout.
 
Last edited:
I too just updated one of my machines to 7.2 and was using GPU passthrough to a 1060, since the update I cannot get the GPU working, I even pinned Proxmox to the older kernel and still get an Error 43 in windows 10 when trying to use the card. How would I go about downgrading Proxmox and updating it to the latest 7.1 packages without having to reinstall and then manually trying to get the latest 7.1 packages?
 
Sounds like Proxmox PCI passthrough works fine in all configurations but one. I have no idea how to fix or work-around NVidia driver issues (with passthrough) on Windows 10, sorry. Hopefully someone else knows some ttricks for this.

EDIT: Have you tried various (older) machine versions of q35, especially 6.1 and 5.2? They change the virtual PCIe layout and maybe the driver expects a certain layout.
I have tried 6.2, 6.1, 5.2, 5.1 and all of them bootloop with a blackscreen.
 
I had some weird display issues with my single-GPU passthrough initially. I switched over to the emulated display and ran Display Driver Uninstaller in safe mode. After that was done I put the GPU back as primary and installed the latest drivers from Nvidia. All of my weird display issues went away.
 
I had some weird display issues with my single-GPU passthrough initially. I switched over to the emulated display and ran Display Driver Uninstaller in safe mode. After that was done I put the GPU back as primary and installed the latest drivers from Nvidia. All of my weird display issues went away.
Unfortunately while installing drivers the guest freezes and cpu usage spikes up to 100%.
 
I found out that the guest bsods with VIDEO_TDR_FAILURE by connecting VirtIO-GPU, the resource usage symptoms look similar to when the VirtIO-GPU is not attached so I think it is the root cause, and not caused by me attaching it. Also the gpu on host looks a bit weird to me, most people have it on something like this /sys/devices/pci0000:00/0000:00:02.0, but I have this /sys/devices/pci0000:80/0000:80:02.0/0000:84:00.0/ could this be an issue?
 
Update: looks like there are some known issues with GPU passthrough and this version
https://pve.proxmox.com/wiki/Roadmap#7.2-known-issues

--

Hi all,

I don't want to mix things in this thread but after upgrading to 7.2 my windows 11 with GPU passthrough working for more than 2 years just stopped. Now log returns an infinity "device busy".
Maybe its not working to @localcc either because of this.

Thanks,
 
Last edited:
If the boot GPU is the integrated graphics then video=efifb:off video=efifb:eek:ff" should not be necessary (and the second one is invalid anyway). Do you see BOOTFB in cat /proc/iomem? Do you have BAR cannot reserve memory errors in your Syslog when starting the VM?
Do you see the boot messages and Proxmox host console on the integrated graphics? Sometimes a monitor needs to be plugged in for integrated graphics to be actually used during boot.
I came across this post because i also started having problems on a running system with Bar 0 can't reserve after upgrading to 7.2/5.15. I have set video=efifb:off video=vesafb:off video=simplefb:off and now i do get Bar 1 can't reserve and see that the memory is in fact taken by bootfb - what would you recommend ind that case? (Highjacking the post since its a high google result when searching for the issue)
 
I came across this post because i also started having problems on a running system with Bar 0 can't reserve after upgrading to 7.2/5.15. I have set video=efifb:off video=vesafb:off video=simplefb:off and now i do get Bar 1 can't reserve and see that the memory is in fact taken by bootfb - what would you recommend ind that case? (Highjacking the post since its a high google result when searching for the issue)
If nothing else works, I read in some threads that unplugging the GPU and doing a rescan of PCI devices works for several people.
Personally with a RX570 (used for Proxmox boot and console before starting the VM), I don't use video=... parameters and let amdgpu load for that GPU to fix this for me (with the help of vendor-reset).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!