I've got a Supermicro X11SPA-TF motherboard with an Intel Xeon 6226R running Proxmox 8.1.1, and I've got a Sapphire 6600XT Nitro+ 8GB in CPU Slot 3 (x16, PCIe 3.0) and an NVIDIA 1660 Ti in CPU Slot 7 (x16, PCIe 3.0).
When I assign the NVIDIA card to a VM, it boots and works fine (tested with Windows 11 so far). If I assign the 6600XT to a VM, the VM never boots, and instead it sits with a single core at 100% and does nothing (no errors, no problems with the host, and I'm free to start/stop the VM as many times as I want). I tested this with the client running Windows 11 and macOS 14; both VMs boot and work fine with a virtual VGA card, but do nothing when the 6600XT is assigned to them. The Windows 11 VM boots fine with the NVIDIA card, and I was able to install drivers and Parsec and run FurMark, so that appears to be fine.
I've reboot the node countless times, and tried a lot of permutations of settings, all with the same result. One thing I haven't tried yet is removing the NVIDIA GPU from the system. Should I?
I did try dumping the 6600XT's BIOS, but I only was able to get ~119KB. However, I used identifiers in that rom to find the full rom on techpowerup, (it matched byte-for-byte with what I was able to dump, but went on for a full 1MB). I used that full 1MB rom with the VM. This resulted in no change in the observed behavior.
Here's a dump of information, and I'm happy to provide more:
Any suggestions on what to try next would be greatly appreciated! Thanks in advance!
When I assign the NVIDIA card to a VM, it boots and works fine (tested with Windows 11 so far). If I assign the 6600XT to a VM, the VM never boots, and instead it sits with a single core at 100% and does nothing (no errors, no problems with the host, and I'm free to start/stop the VM as many times as I want). I tested this with the client running Windows 11 and macOS 14; both VMs boot and work fine with a virtual VGA card, but do nothing when the 6600XT is assigned to them. The Windows 11 VM boots fine with the NVIDIA card, and I was able to install drivers and Parsec and run FurMark, so that appears to be fine.
I've reboot the node countless times, and tried a lot of permutations of settings, all with the same result. One thing I haven't tried yet is removing the NVIDIA GPU from the system. Should I?
I did try dumping the 6600XT's BIOS, but I only was able to get ~119KB. However, I used identifiers in that rom to find the full rom on techpowerup, (it matched byte-for-byte with what I was able to dump, but went on for a full 1MB). I used that full 1MB rom with the VM. This resulted in no change in the observed behavior.
Here's a dump of information, and I'm happy to provide more:
Bash:
$ cat /proc/cmdline
initrd=\EFI\proxmox\6.5.11-7-pve\initrd.img-6.5.11-7-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt
$ cat /etc/modules
vfio
vfio_iommu_type1
vfio_pci
$ cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0
$ cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2182,10de:1aeb,10de:1aec,10de:1aed,1002:73ff,1002:ab28,1002:1478,1002:1479
$ cat /etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist amdgpu
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm
$ dmesg | grep "IOMMU enabled"
[ 0.270400] DMAR: IOMMU enabled
$ dmesg | grep remapping
[ 0.751347] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.752247] DMAR-IR: Enabled IRQ remapping in x2apic mode
$ dmesg | grep -i vfio
[ 6.463302] VFIO - User Level meta-driver version: 0.3
[ 6.468926] vfio-pci 0000:65:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[ 6.469051] vfio_pci: add [10de:2182[ffffffff:ffffffff]] class 0x000000/00000000
[ 6.516107] vfio_pci: add [10de:1aeb[ffffffff:ffffffff]] class 0x000000/00000000
[ 6.516145] vfio_pci: add [10de:1aec[ffffffff:ffffffff]] class 0x000000/00000000
[ 6.516162] vfio_pci: add [10de:1aed[ffffffff:ffffffff]] class 0x000000/00000000
[ 6.516183] vfio-pci 0000:1b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[ 6.516330] vfio_pci: add [1002:73ff[ffffffff:ffffffff]] class 0x000000/00000000
[ 6.616182] vfio_pci: add [1002:ab28[ffffffff:ffffffff]] class 0x000000/00000000
[ 6.616243] vfio_pci: add [1002:1478[ffffffff:ffffffff]] class 0x000000/00000000
[ 6.616285] vfio_pci: add [1002:1479[ffffffff:ffffffff]] class 0x000000/00000000
$ pvesh get /nodes/nodename/hardware/pci --pci-class-blacklist ""
# output trimmed for relevance, happy to provide full output if that is helpful
┌──────────┬────────┬──────────────┬────────────┬────────┬──────────────────────────────────────────────────────────────┬──────┬──────────────────┬───────────────────────┬──────────────────┬──────────
│ class │ device │ id │ iommugroup │ vendor │ device_name │ mdev │ subsystem_device │ subsystem_device_name │ subsystem_vendor │ subsystem
╞══════════╪════════╪══════════════╪════════════╪════════╪══════════════════════════════════════════════════════════════╪══════╪══════════════════╪═══════════════════════╪══════════════════╪══════════
│ 0x030000 │ 0x2000 │ 0000:04:00.0 │ 44 │ 0x1a03 │ ASPEED Graphics Family │ │ 0x1b28 │ │ 0x15d9 │ Super Mic
│ 0x030000 │ 0x73ff │ 0000:1b:00.0 │ 12 │ 0x1002 │ Navi 23 [Radeon RX 6600/6600 XT/6600M] │ │ 0xe448 │ │ 0x1da2 │ Sapphire
│ 0x030000 │ 0x2182 │ 0000:65:00.0 │ 5 │ 0x10de │ TU116 [GeForce GTX 1660 Ti] │ │ 0x1333 │ │ 0x196e │ PNY
│ 0x040300 │ 0xab28 │ 0000:1b:00.1 │ 13 │ 0x1002 │ Navi 21/23 HDMI/DP Audio Controller │ │ 0xab28 │ │ 0x1002 │ Advanced
│ 0x040300 │ 0x1aeb │ 0000:65:00.1 │ 5 │ 0x10de │ TU116 High Definition Audio Controller │ │ 0x1333 │ │ 0x196e │ PNY
│ 0x060400 │ 0x1478 │ 0000:19:00.0 │ 10 │ 0x1002 │ Navi 10 XL Upstream Port of PCI Express Switch │ │ 0x0000 │ │ 0x0000 │
│ 0x060400 │ 0x1479 │ 0000:1a:00.0 │ 11 │ 0x1002 │ Navi 10 XL Downstream Port of PCI Express Switch │ │ 0x1479 │ │ 0x1002 │ Advanced
│ 0x0c0330 │ 0x1aec │ 0000:65:00.2 │ 5 │ 0x10de │ TU116 USB 3.1 Host Controller │ │ 0x1333 │ │ 0x196e │ PNY
│ 0x0c8000 │ 0x1aed │ 0000:65:00.3 │ 5 │ 0x10de │ TU116 USB Type-C UCSI Controller │ │ 0x1333 │ │ 0x196e │ PNY
└──────────┴────────┴──────────────┴────────────┴────────┴──────────────────────────────────────────────────────────────┴──────┴──────────────────┴───────────────────────┴──────────────────┴──────────
Any suggestions on what to try next would be greatly appreciated! Thanks in advance!