Proxmox: Pass 2 identical GPUs to same VM (not working)

VideoDad

New Member
Feb 22, 2024
1
0
1
Hello -

I am trying to create a video editing VM, passing two identical GPUs through to a single VM. I'm not trying to split a GPU - just configure a single VM, and pass both GPUs through to it for video editing. After several different combinations, variations, threads, and youtubes - I'm clearly missing something, and so am posting in hopes an expert knows how to make this work.

I have a Dell R720; 192 GB RAM, 2x 2.6Ghz E5-2670 CPUs, and all firmware up to date. Onboard embedded video is a Matrox (used for Proxmox console). I installed two Nvidia Quadro K4000 cards; one in slot 4, one in slot 6). Per Dell, it can take up to 4, and per Dell, they all need to be the same. I have the correct risers, power feeds, and power supplies. I installed Proxmox 8.1.0.

(Note: I have read threads where it's frowned upon to have identical GPUs used as passthrough - and yet others that say it works fine).

As it is now, within a Windows 10 guest, both cards are recognized, both output video simultaneously, but one card shows error 43 and is "disabled". I have my desktop extended between two monitors; each monitor plugged into a separate card. So it *appears* that passthrough "sort of" works, but I can't clear the 43 code. and since the 43 code is there, as far as any programs are concerned, there is only one K4000 operating.

I have:
  • verified IOMMU is enabled
  • verified IOMMU remapping is enabled
  • blacklisted the NVidia drivers from the promox host (nvidiafb and noveau) in /etc/modprobe.d/blacklist.conf
  • verified (lspci -nnk) that after boot, the "in use" drivers are not listed - (both cards are clear for passthrough)
04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK106GL [Quadro K4000] [10de:11fa] (rev a1)
Subsystem: NVIDIA Corporation GK106GL [Quadro K4000] [10de:097c]
Kernel modules: nvidiafb, nouveau
04:00.1 Audio device [0403]: NVIDIA Corporation GK106 HDMI Audio Controller [10de:0e0b] (rev a1)
Subsystem: NVIDIA Corporation GK106 HDMI Audio Controller [10de:097c]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
and
42:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK106GL [Quadro K4000] [10de:11fa] (rev a1)
Subsystem: NVIDIA Corporation GK106GL [Quadro K4000] [10de:097c]
Kernel modules: nvidiafb, nouveau
42:00.1 Audio device [0403]: NVIDIA Corporation GK106 HDMI Audio Controller [10de:0e0b] (rev a1)
Subsystem: NVIDIA Corporation GK106 HDMI Audio Controller [10de:097c]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
  • After several different combinations and suggestions re grub (posts about video issues), I modified my file to include the following (this is the current iteration; I initially started with just quiet and iommu on), then updated grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on pcie_acs_override=downstream,multifunction video=efifb:off video=vesa:off vfio-pci.ids=10de:11fa,10de:0e0b,10de:11fa,10de:0e0b vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 modprobe.blacklist=radeon,nouveau,nvidia,nvidiafb,nvidia-gpu"

  • I made sure my /etc/modules file had the correct modules listed:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

  • I made sure my modprobe file (/etc/modprobe.d/vfio.conf) had the correct vendor ID codes for the cards. Initially, I only inserted one set (graphics + sound), but in an attempt to resolve this, added two sets, thinking the parser might not know to initialize/config two cards if only one set was listed. Having two sets in the file doesn't seem to have hurt or helped.
lspci –nn | grep nvidia

04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK106GL [Quadro K4000] [10de:11fa] (rev a1)
04:00.1 Audio device [0403]: NVIDIA Corporation GK106 HDMI Audio Controller [10de:0e0b] (rev a1)
42:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK106GL [Quadro K4000] [10de:11fa] (rev a1)
42:00.1 Audio device [0403]: NVIDIA Corporation GK106 HDMI Audio Controller [10de:0e0b] (rev a1)

Each card appears to be in its own IOMMU group, so I don't think they are stepping on each other.

1st attempt: --> options vfio-pci ids=10de:11fa,10de:0e0b, disable_vga=1
Nth attempt: --> options vfio-pci ids=10de:11fa,10de:0e0b,10de:11fa,10de:0e0b disable_vga=1

I updated initramfs, and rebooted.

For my VM, I initially started with "host" as the CPU type, but read threads that this needed to be specifically set to trick the VM into not knowing it was a VM, so that Nvidia would work properly - because they don't play nice with VMs, so changed that. (It didn't matter; the VM still knows it's a VM).

Here's my current VM config iteration (after reading balloon needed to be turned off (didn't help), tried setting x-vga=1 on both cards (didn't help)
PIC2 and 3 are a Creative Sound Card and a firewire card (video camera input) I also pass through a USB mouse and KB so I can use the VM sitting at the Dell.

balloon: 0
bios: ovmf
boot: order=ide0;ide2;net0
cores: 8
cpu: SandyBridge,hidden=1
efidisk0: local-lvm:vm-102-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:04:00,pcie=1,x-vga=1
hostpci1: 0000:42:00,pcie=1
hostpci2: 0000:41:00,pcie=1
hostpci3: 0000:43:00,pcie=1
ide0: local-lvm:vm-102-disk-1,size=150G
ide2: local:iso/Windows10.iso,media=cdrom,size=4697792K
machine: pc-q35-8.1
memory: 128032
meta: creation-qemu=8.1.2,ctime=1706840249
name: Windows20Test
net0: e1000=BC:24:11:C4:4F:67,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=033e4ca2-63ab-4668-b21a-3034307af478
sockets: 4
tpmstate0: local-lvm:vm-102-disk-2,size=4M,version=v2.0
usb0: host=1-1.2
usb1: host=2-1.2
vmgenid: M A S K E D

I guess I can understand why I saw posts that using two of the same card was a bad idea, as nowhere in the config files was I able to deduce a config/statement that told Proxmox there were two identical cards and how to deal with non-unique vendor IDs. But - at the same time, that didn't make functional sense - it should know how to deal with identical cards.

After building the VM, I installed the NVidia drivers - and for a brief instant, the code 43 went away... but then came back.
The first Nvidia driver I tried was 474.44.
Today, to see if it would help, I clean installed 474.82. No joy. The "error 43" persists, even though both cards are outputting video.
Each time I made a change to a config, I clean reinstalled the drivers... no joy.

Any help would be greatly appreciated. I would like to be able to get this to work, but perhaps I truly cannot have two K4000 cards, and will need to swap one of them out with a different vendor.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!