GPU passthrough to Windows VM problem

gerrrald

New Member
Nov 18, 2023
2
0
1
Hi all

I'm struggling to get a GPU to passthrough to a Windows VM to be used for video out/light gaming. I believe I've done all the bios/kernel/vm config required - hoping there's enough info here for someone to spot what I've missed.

I have a Dell PowerEdge r720 with dual Intel E5-2650 CPUs. The GPU is an Nvidia 1050 Ti. I'm running a fresh install of ProxMox VE 8.0.4 on Debian 12 (bookworm) and Kernel version 6.2.16-15-pve.

Windows recognises the GPU to some degree, describing it with a logical name ("NVIDIA GeForce GTX 1050 Ti") in the Display Adapters section of Device Manager. However, it has a yellow triangle with an exclamation mark in it, implying something's wrong. Not sure how to find more details about this? I have installed the latest Windows GeForce drivers but still no luck. With a screen connected, nothing is shown on it, at any stage of POST, Windows boot process, or when fully 'up'. The screen is always looking for a signal. Hopefully, some of the details below will shed some light on what I've missed!

I've read and followed these:
https://pve.proxmox.com/wiki/PCI_Passthrough and https://pve.proxmox.com/wiki/PCI(e)_Passthrough

And I've taken a few suggestions from various other forum posts as well. However, not sure what I've missed, so I'll try to cover all the relevant config in the hope that it's obvious what I've missed - any help would be greatly appreciated :)

First, I've ensured hardware virtualisation is enabled in the bios (Bios version 2.2.2 and firmware 2.65.65.65)

Here's all the config I think is relevant:

Code:
root@node1:~# grep GRUB_CMDLINE_LINUX_DEFAULT /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init intremap=no_x2apic_optout"

root@node1:~# cat /etc/kernel/cmdline 
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on
initcall_blacklist=sysfb_init
intremap=no_x2apic_optout
root@node1:~# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-6.2.16-15-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init intremap=no_x2apic_optout

root@node1:~# cat /etc/modules | grep -vE "^#"
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

root@node1:~# cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

root@node1:~# cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1

root@node1:~# cat /etc/modprobe.d/blacklist.conf 
blacklist nvidia*
blacklist nouveau
blacklist snd_hda_intel

root@node1:~# dmesg | grep -E "(DMAR|IOMMU)"
[    0.012556] ACPI: DMAR 0x00000000BD3346F4 000158 (v01 DELL   PE_SC3   00000001 DELL 00000001)
[    0.012606] ACPI: Reserving DMAR table memory at [mem 0xbd3346f4-0xbd33484b]
[    0.953579] DMAR: IOMMU enabled
[    2.147736] DMAR: Host address width 46
[    2.147738] DMAR: DRHD base: 0x000000c8100000 flags: 0x0
[    2.147745] DMAR: dmar0: reg_base_addr c8100000 ver 1:0 cap d2078c106f0466 ecap f020de
[    2.147748] DMAR: DRHD base: 0x000000dc100000 flags: 0x1
[    2.147753] DMAR: dmar1: reg_base_addr dc100000 ver 1:0 cap d2078c106f0466 ecap f020de
[    2.147756] DMAR: RMRR base: 0x000000bf458000 end: 0x000000bf46ffff
[    2.147758] DMAR: RMRR base: 0x000000bf450000 end: 0x000000bf450fff
[    2.147760] DMAR: RMRR base: 0x000000bf452000 end: 0x000000bf452fff
[    2.147761] DMAR: ATSR flags: 0x0
[    2.147765] DMAR-IR: IOAPIC id 2 under DRHD base  0xc8100000 IOMMU 0
[    2.147767] DMAR-IR: IOAPIC id 0 under DRHD base  0xdc100000 IOMMU 1
[    2.147768] DMAR-IR: IOAPIC id 1 under DRHD base  0xdc100000 IOMMU 1
[    2.147770] DMAR-IR: HPET id 0 under DRHD base 0xdc100000
[    2.147771] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    2.148528] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    2.958343] DMAR: No SATC found
[    2.958347] DMAR: dmar0: Using Queued invalidation
[    2.958356] DMAR: dmar1: Using Queued invalidation
[    2.962422] DMAR: Intel(R) Virtualization Technology for Directed I/O

root@node1:~# cat /etc/pve/nodes/node1/qemu-server/107.conf 
agent: 1
bios: ovmf
boot: order=ide0;ide2;net0
cores: 4
cpu: x86-64-v2-AES
efidisk0: VM_Storage1:vm-107-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:42:00,device-id=0x1c82,pcie=1,vendor-id=0x10de,x-vga=1
hostpci1: 0000:43:00,pcie=1
ide0: VM_Storage1:vm-107-disk-1,cache=writeback,discard=on,size=64G
ide1: VM_Storage1:vm-107-disk-3,discard=on,size=320G
ide2: local:iso/Win11_22H2_English_x64v2.iso,media=cdrom,size=5705260K
machine: pc-q35-8.0
memory: 32768
meta: creation-qemu=8.0.2,ctime=1700232586
name: windows1
net0: e1000=46:D6:8C:CB:F2:2B,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=90534a36-70c3-4bf0-a2f9-b2d448c3434e
sockets: 2
tpmstate0: VM_Storage1:vm-107-disk-2,size=4M,version=v2.0
usb0: host=0bda:8771,usb3=1
usb1: host=046d:c34b,usb3=1
usb2: host=0624:0249,usb3=1
vga: std
vmgenid: 6fb5466d-d28e-41de-9882-ffc9caac7193

root@node1:~# lspci -nnk | grep -i nvid -A5
42:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
    Subsystem: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:0939]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau
42:00.1 Audio device [0403]: NVIDIA Corporation GF106 High Definition Audio Controller [10de:0be9] (rev a1)
    Subsystem: NVIDIA Corporation GF106 High Definition Audio Controller [10de:0939]
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel
43:00.0 Audio device [0403]: Creative Labs CA0132 Sound Core3D [Sound Blaster Recon3D / Z-Series / Sound BlasterX AE-5 Plus] [1102:0012] (rev 01)
    Subsystem: Creative Labs SB1570 SB Audigy Fx [1102:0010]
    Kernel driver in use: vfio-pci

Notes about the above: Not sure whether cmdline or grub is used, so I've modified both.

Likewise not sure if `proxmox-boot-tool refresh` and `update-initramfs -u -k all` do different things, so ran them both too :s

I originally had this output from dmesg:
[ 2.146790] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
So added "intremap=no_x2apic_optout" to cmdline.

Does anyone have any ideas? Please let me know if there's anything I've missed! And thank you for reading!
 
Not yet.

Wondering whether it's a hardware fault at this point. Or to be fair, the more likely, that I've just missed something. I've got a couple of things to try - need to rule out hardware error by booting from a livecd or something, and may try an OS other than Windows. Either way, if I make any progress I'll update here.

What have you tried? What hardware etc?
 
Not yet.

Wondering whether it's a hardware fault at this point. Or to be fair, the more likely, that I've just missed something. I've got a couple of things to try - need to rule out hardware error by booting from a livecd or something, and may try an OS other than Windows. Either way, if I make any progress I'll update here.

What have you tried? What hardware etc?
R720, Nvidia Tesla M40

I went through multiple guides multiple times, only to find out that I hadn't connected the power cables properly. Doh!

This particular card requires a special Y cable that connects to the card, as well as a Dell Y cable (9H6FV) that connects to the riser. It turns out that both ends of the GPU's Y cable need power. Initially, I only connected the two 8-pin connectors, as I read that you only needed the two cables. This left a 6 pin on the riser side and an 8 pin on the GPU side. Why would you connect a 6 pin to an 8 pin?? To get full power to your card, I reckon. So, I plugged the remaining 6-pin connector from the 9H6FV into the leftover 8-pin connector on the GPU cable, and verified the pinout voltage with a multimeter. Looking good.... If you're reading this and trying to wire up your card, make sure to verify that the riser can handle the GPU's maximum wattage. This one can, but some can't. If not, you would need an additional cable to draw power from a different source for the other connector.

To add to the confusion, I was able to detect the card in Proxmox as well as in my VM. I installed a driver but kept getting a Code 10 error. I feel pretty dumb right now, but wanted to share in case someone else has this problem. Some of the instructions on this hardware are as clear as mud.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!