Win10 GPU passthrough issues: Code 43 or Code 35 GTX 1080 Ti on HPE DL380 Gen9

njafs

New Member
Aug 16, 2023
4
0
1
Hello everyone, I didn't mean to come to this and bother you guys but after going through all this forum (with many successful solutions), tenths of different guides and videos, building, configuring, destroying, rebuilding and reconfiguring to destroy again, I'm now officially desperate. I bought a HPE DL 380 Gen9 server for my homelab and I chose Proxmox as VE to be able to use ZFS raid for VM storage, so I'm quite new at it.
One of these VMs has to be a remote gaming machine, thus I took away the flawless working Quadro K620, got a Zotac GTX 1080 Ti blower from a friend and started configuring.
No matter which guide I follow (old or new), I am at my 100th attempt and what I get is always the infamous Code 43 in Win10 VM.

Proxmox boots from a hardware RAID-10 SSD lvm using GRUB, ZFS raid is used for VM disks only. Some changes in there parameters lead me to actual situation with server console freezed on loading ramdisk.... I read about this on this forum and I'm confident I won't have it in the end any more, since server and proxmox will share integrated GPU while 1080Ti will be isolated and passed exclusively to VM. Server actually boots and other VMs work fine!

My actual configuration is reasonably a mess and comes from 1-6 years old material, it used to be simpler when I had the K620 and proxmox login screen was displayed in server console.

Code:
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.15.108-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init nomodeset video=vesafb:off video=efifb:off video=simplefb:off

IOMMU group (some guides say they must be single, some say they must be splitted....what's the answer?)
Code:
IOMMU Group 94:
        84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1)
        84:00.1 Audio device [0403]: NVIDIA Corporation GP102 HDMI Audio Controller [10de:10ef] (rev a1)

Code:
lspci -nnk
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. GP102 [GeForce GTX 1080 Ti] [19da:1470]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau
84:00.1 Audio device [0403]: NVIDIA Corporation GP102 HDMI Audio Controller [10de:10ef] (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. GP102 HDMI Audio Controller [19da:1470]
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

Code:
cat /etc/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Code:
cat /etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist nouveau
blacklist nvidia

Code:
cat /etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE
# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb

Is there any difference blacklisting a driver in blacklist.conf or pve-blacklist.conf?

Code:
cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0

Code:
cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

Code:
cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1b06,10de:10ef disable_vga=1

Code:
dmesg | grep -i vfio
[    8.232097] VFIO - User Level meta-driver version: 0.3
[    8.241496] vfio-pci 0000:84:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    8.261387] vfio_pci: add [10de:1b06[ffffffff:ffffffff]] class 0x000000/00000000
[    8.281316] vfio_pci: add [10de:10ef[ffffffff:ffffffff]] class 0x000000/00000000
[  200.066232] vfio-pci 0000:84:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[  200.086845] vfio-pci 0000:84:00.1: enabling device (0140 -> 0142)
[ 1456.386052] vfio-pci 0000:84:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 1557.679559] vfio-pci 0000:84:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 1709.051338] vfio-pci 0000:84:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 1929.574380] vfio-pci 0000:84:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 3494.627535] vfio-pci 0000:84:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 3826.294611] vfio-pci 0000:84:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 4024.540187] vfio-pci 0000:84:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 4380.460742] vfio-pci 0000:84:00.0: vfio_ecap_init: hiding ecap 0x19@0x900

Code:
dmesg | grep -e DMAR -e IOMMU
[    0.023851] ACPI: DMAR 0x000000007B7E7000 000300 (v01 HP     ProLiant 00000001 HP   00000001)
[    0.023941] ACPI: Reserving DMAR table memory at [mem 0x7b7e7000-0x7b7e72ff]
[    1.277537] DMAR: IOMMU enabled
[    2.877110] DMAR: Host address width 46
[    2.877112] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[    2.877123] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020de
[    2.877128] DMAR: DRHD base: 0x000000c7ffc000 flags: 0x1
[    2.877135] DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020de
[    2.877138] DMAR: RMRR base: 0x00000079174000 end: 0x00000079176fff
[    2.877142] DMAR: RMRR base: 0x000000791f4000 end: 0x000000791f7fff
[    2.877148] DMAR: RMRR base: 0x000000791de000 end: 0x000000791f3fff
[    2.877150] DMAR: RMRR base: 0x000000791cb000 end: 0x000000791dbfff
[    2.877153] DMAR: RMRR base: 0x000000791dc000 end: 0x000000791ddfff
[    2.877156] DMAR: ATSR flags: 0x0
[    2.877159] DMAR: ATSR flags: 0x0
[    2.877164] DMAR-IR: IOAPIC id 10 under DRHD base  0xfbffc000 IOMMU 0
[    2.877168] DMAR-IR: IOAPIC id 8 under DRHD base  0xc7ffc000 IOMMU 1
[    2.877171] DMAR-IR: IOAPIC id 9 under DRHD base  0xc7ffc000 IOMMU 1
[    2.877174] DMAR-IR: HPET id 0 under DRHD base 0xc7ffc000
[    2.877177] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    2.878722] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    3.444494] DMAR: No SATC found
[    3.444497] DMAR: dmar0: Using Queued invalidation
[    3.444508] DMAR: dmar1: Using Queued invalidation
[    3.458158] DMAR: Intel(R) Virtualization Technology for Directed I/O


VM is fresh W10 Pro, RDP on, Nvidia drivers downloaded, supposedly ready for GPU passthrough and later "switching to primary" as depicted in too many guides. I made a snapshot of this VM so that I could try all possible combinations while adding GPU and restarting the process, but the best thing i can get to is passthrough as secondary and install drivers. After that, reboot and Code 43 forever.
What I already tried to modify according guides are all parameters concerning:
cpu (host, hidden, flags=pcid....., with or w/o)
pci rom (rom-bar, no rom-bar, downloaded rom, downloaded patched rom, dumped rom, dumped patched rom)
machine type (any from 6.0 to 7.2, snapshot is 6.2)
args (with or without supposedly old kvm related parameters...sorry I can't find them again now)

Code:
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 10
cpu: host
efidisk0: hwR10_SSD_400GB:vm-150-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: local:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: pc-q35-6.2
memory: 65536
meta: creation-qemu=7.2.0,ctime=1691933057
name: njafs
net0: e1000=6A:23:95:48:AD:96,bridge=vmbr0
numa: 0
ostype: win10
scsi0: R1_NVME_2TB:vm-150-disk-0,discard=on,iothread=1,size=250G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=83c27078-aae3-4291-8dc4-5a18fcb158d2
snaptime: 1692121223
sockets: 1
vmgenid: 1353b6b0-864a-40ea-a9fe-4e048ba8152b


So, basically, I'm asking for some hints, help, suggestion on how to start fresh again and where from....anything that could bring me closer to a stable and working environment. Let me know if you need some other data, couldn't wait to cooperate and helping some other guys with same hardware and issues.
I'm again sorry for creating another thread for this well-known-often-solved issue, but I'm really starting to get really mad at this.
Last resort for me would be going through the V-GPU approach, but I'd like to stick to a "simple" PCI passthrough, so anything would be really appreciated!

And my biggest question of them all is: isn't passthrough for consumer GPUs officially supported since 2021, is it? Why this then?
 
It's supported since 2021 and many pepole got a 1080Ti working, I personally made some other changes to cleanup cmdline and managed to get to Code 35 now in Win10VM using patched downloaded ROM instead of 43.
 
Unluckily, I was unable to get out of error 43 or 35. I went through even more studying on vfio, kernel parameters and isolation and I believe the problem is in the server BIOS.
In BIOS video output options, only 2 options are available: Output to Integrated GPU AND Add-in GPU / Output to Add-in GPU only. I believe this makes phisically impossible to isolate the 1080i because server video output will always use any add-in GPUs before and for Proxmox or any other OS.

Evidences to that and some other I found during infinite testing (and VGPU attempts also):
- If I choose Add-in GPU only, server shell freezes after POST and a HP pop-up message shows up saying video output is redirected to add-in gpu
- I put the K620 back in PCI slot 2, keeping the 1080Ti in 5 (card size/PCIx16) and hoping first GPU would be used and second one isolated....nope.
- Everything worked fine during VGPU installation and driver patching till starting VM and getting QEMU error saying it was unable to create IOMMU group 128 cause the peripheral could be already in use. I tried to modify custom profiles to use less VRAM and possibly avoid any allocation error...nope.

I believe the closer I was to solution was when I was able to get error 35 instead of 43 with PCI passthrough and the freezed "loading ramdisk" screen some time ago, but there isn't much around on how to fix this. I tried all possible combinations of IOMMU groups (single and splitted) and romfiles and all the parameters I found one by one on additive basis, reboots and even clean VM creations every time. No way.

So, sadly, almost 1 month after first attempt and receiving no further help here, I'm afraid I'm forced to give the 1080Ti back to my friend and look for another second-hand better Quadro.
I hope all this will help someone in the future to finally go through it and/or correcting the mistakes I was surely making, I bet it's gonna be either an impossible mix and match or a step-1 mistake by me...so thanks anyway and good luck!!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!