Yet another GPU passthrough help request

iluap

New Member
May 19, 2024
2
0
1
Hello,
New to Proxmox. I have recently bought a gaming PC with the intention of having some VM's on it. The specs are as follows:

  • CPU Ryzen 5900X
  • Motherboard MSI X570-A Pro
  • GPU MSI RTX 3070 Ti SUPRIM X 8G
  • GPU Radeon 5450 (ancient, I know)
I installed PVE 9.0.11 and created a Debian 13 VM, which I'd like to use the main GPU with (the secondary old GPU is in case I need to connect to PVE itself with a monitor). I have followed the official Proxmox doc for PCI Passthrough but, every time I try to start the VM with the GPU linked to it, I get the error "start failed: QEMU exited with code 1". I believe this is because I am using CSM in the BIOS to support my old card. If I don't link the GPU, the machine starts and I can use the console for it to manage the VM.

This is the conf file for the VM:

args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 4
cpu: host,hidden=1
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:2d:00,rombar=0
ide2: none,media=cdrom
machine: q35
memory: 16384
meta: creation-qemu=10.1.2,ctime=1763163244
name: test
net0: virtio=BC:24:11:C7:1B:6B,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-100-disk-1,iothread=1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=cb571e45-9536-409e-9e72-e9c617089825
sockets: 1
vmgenid: 802fb5f4-810f-4059-8d01-f6c22c6dae8c

I have enabled SVM as per the flag in lscpu as well as IOMMU in the BIOS:

dmesg | grep -e IOMMU
[ 0.686497] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.690502] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

remapping is enabled (I believe):

dmesg | grep 'remapping'
[ 0.415551] x2apic: IRQ remapping doesn't support X2APIC mode
[ 0.688152] AMD-Vi: Interrupt remapping enabled

I can see the GPU in group 24 when I run pvesh get /nodes/test/hardware/pci --pci-class-blacklist "" (as well as the other GPU)

classdeviceid iommugroupvendordevice_namesubsytem_device
0x0300000x24820000:2d:00.0240x10deGA104 [GeForce RTX 3070 Ti] 0x5051
0x0403000x228b0000:2d:00.1240x10deGA104 High Definition Audio Controller0x5051
0x0300000x68f90000:23:00.0210x1002Cedar [Radeon HD 5000/6000/7350/8350 Series] 0x2009
0x0403000xaa680000:23:00.1210x1002Cedar HDMI Audio [Radeon HD 5400/6300/7300 Series] 0xaa68

I have blacklisted the GPU drivers in /etc/modprobe.d/blacklist.conf:

blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm

In /etc/modules I have the below and run update-initramfs -u after I modified it:

vfio
vfio_iommu_type1
vfio_pci
#vifio_virqfd (read somewhere this is not required anymore)

I used the ID for my GPU from the able above to get the required id's:

lspci -n -s 2d:00
2d:00.0 0300: 10de:2482 (rev a1)
2d:00.1 0403: 10de:228b (rev a1)

I then edited /etc/modprobe.d/vfio.conf to add them:

options vfio-pci ids=10de:2482,10de:228b disable_vga=1

Now, the issue: when I try to start the machine I get the below error. I believe the problem is my secondary GPU because the PC has to start with CSM instead of UEFI. I cannot see any obvious reasons but, after commenting out the blacklist file, rebooting, entering the BIOS and changing the mode to UEFI, I can start the VM. However, even if I could live with just one GPU, when I got the system back and remove the comments in the blacklist file and rebooted, I can't still not see the video of the VM using the GPU.
Any ideas to:

1. Use both GPU's (old one seems to require CSM)?
2. If not possible, why I can't see the VM display output through the 3070 Ti?

Apologies if the explanation is confusing: I have been tryhing to sort out this for the past couple of days and logic is all getting blurry at this point.

Thanks in advance.
 
I forgot to add logs when I start the VM with qm start 100:

Nov 15 22:45:42 cerro qm[2075]: start VM 100: UPID:cerro:0000081B:00003E34:69190296:qmstart:100:root@pam:
Nov 15 22:45:42 cerro qm[2074]: <root@pam> starting task UPID:cerro:0000081B:00003E34:69190296:qmstart:100:root@pam:
Nov 15 22:45:42 cerro kernel: vfio-pci 0000:2d:00.0: resetting
Nov 15 22:45:42 cerro kernel: vfio-pci 0000:2d:00.0: reset done
Nov 15 22:45:42 cerro systemd[1]: Created slice qemu.slice - Slice /qemu.
Nov 15 22:45:42 cerro systemd[1]: Started 100.scope.
Nov 15 22:45:43 cerro kernel: tap100i0: entered promiscuous mode
Nov 15 22:45:43 cerro kernel: vmbr0: port 2(fwpr100p0) entered blocking state
Nov 15 22:45:43 cerro kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Nov 15 22:45:43 cerro kernel: fwpr100p0: entered allmulticast mode
Nov 15 22:45:43 cerro kernel: fwpr100p0: entered promiscuous mode
Nov 15 22:45:43 cerro kernel: vmbr0: port 2(fwpr100p0) entered blocking state
Nov 15 22:45:43 cerro kernel: vmbr0: port 2(fwpr100p0) entered forwarding state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Nov 15 22:45:43 cerro kernel: fwln100i0: entered allmulticast mode
Nov 15 22:45:43 cerro kernel: fwln100i0: entered promiscuous mode
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 1(fwln100i0) entered forwarding state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Nov 15 22:45:43 cerro kernel: tap100i0: entered allmulticast mode
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 2(tap100i0) entered forwarding state
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: enabling device (0002 -> 0003)
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: resetting
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: reset done
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.1: enabling device (0000 -> 0002)
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: resetting
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.1: resetting
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: reset done
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.1: reset done
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: resetting
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: reset done
Nov 15 22:45:44 cerro qm[2075]: VM 100 started with PID 2110.
Nov 15 22:45:44 cerro qm[2074]: <root@pam> end task UPID:cerro:0000081B:00003E34:69190296:qmstart:100:root@pam: OK