[SOLVED] Yet another GPU passthrough help request

iluap

New Member
May 19, 2024
3
0
1
Hello,
New to Proxmox. I have recently bought a gaming PC with the intention of having some VM's on it. The specs are as follows:

  • CPU Ryzen 5900X
  • Motherboard MSI X570-A Pro
  • GPU MSI RTX 3070 Ti SUPRIM X 8G
  • GPU Radeon 5450 (ancient, I know)
I installed PVE 9.0.11 and created a Debian 13 VM, which I'd like to use the main GPU with (the secondary old GPU is in case I need to connect to PVE itself with a monitor). I have followed the official Proxmox doc for PCI Passthrough but, every time I try to start the VM with the GPU linked to it, I get the error "start failed: QEMU exited with code 1". I believe this is because I am using CSM in the BIOS to support my old card. If I don't link the GPU, the machine starts and I can use the console for it to manage the VM.

This is the conf file for the VM:

args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 4
cpu: host,hidden=1
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:2d:00,rombar=0
ide2: none,media=cdrom
machine: q35
memory: 16384
meta: creation-qemu=10.1.2,ctime=1763163244
name: test
net0: virtio=BC:24:11:C7:1B:6B,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-100-disk-1,iothread=1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=cb571e45-9536-409e-9e72-e9c617089825
sockets: 1
vmgenid: 802fb5f4-810f-4059-8d01-f6c22c6dae8c

I have enabled SVM as per the flag in lscpu as well as IOMMU in the BIOS:

dmesg | grep -e IOMMU
[ 0.686497] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.690502] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

remapping is enabled (I believe):

dmesg | grep 'remapping'
[ 0.415551] x2apic: IRQ remapping doesn't support X2APIC mode
[ 0.688152] AMD-Vi: Interrupt remapping enabled

I can see the GPU in group 24 when I run pvesh get /nodes/test/hardware/pci --pci-class-blacklist "" (as well as the other GPU)

classdeviceid iommugroupvendordevice_namesubsytem_device
0x0300000x24820000:2d:00.0240x10deGA104 [GeForce RTX 3070 Ti] 0x5051
0x0403000x228b0000:2d:00.1240x10deGA104 High Definition Audio Controller0x5051
0x0300000x68f90000:23:00.0210x1002Cedar [Radeon HD 5000/6000/7350/8350 Series] 0x2009
0x0403000xaa680000:23:00.1210x1002Cedar HDMI Audio [Radeon HD 5400/6300/7300 Series] 0xaa68

I have blacklisted the GPU drivers in /etc/modprobe.d/blacklist.conf:

blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm

In /etc/modules I have the below and run update-initramfs -u after I modified it:

vfio
vfio_iommu_type1
vfio_pci
#vifio_virqfd (read somewhere this is not required anymore)

I used the ID for my GPU from the able above to get the required id's:

lspci -n -s 2d:00
2d:00.0 0300: 10de:2482 (rev a1)
2d:00.1 0403: 10de:228b (rev a1)

I then edited /etc/modprobe.d/vfio.conf to add them:

options vfio-pci ids=10de:2482,10de:228b disable_vga=1

Now, the issue: when I try to start the machine I get the below error. I believe the problem is my secondary GPU because the PC has to start with CSM instead of UEFI. I cannot see any obvious reasons but, after commenting out the blacklist file, rebooting, entering the BIOS and changing the mode to UEFI, I can start the VM. However, even if I could live with just one GPU, when I got the system back and remove the comments in the blacklist file and rebooted, I can't still not see the video of the VM using the GPU.
Any ideas to:

1. Use both GPU's (old one seems to require CSM)?
2. If not possible, why I can't see the VM display output through the 3070 Ti?

Apologies if the explanation is confusing: I have been tryhing to sort out this for the past couple of days and logic is all getting blurry at this point.

Thanks in advance.
 
I forgot to add logs when I start the VM with qm start 100:

Nov 15 22:45:42 cerro qm[2075]: start VM 100: UPID:cerro:0000081B:00003E34:69190296:qmstart:100:root@pam:
Nov 15 22:45:42 cerro qm[2074]: <root@pam> starting task UPID:cerro:0000081B:00003E34:69190296:qmstart:100:root@pam:
Nov 15 22:45:42 cerro kernel: vfio-pci 0000:2d:00.0: resetting
Nov 15 22:45:42 cerro kernel: vfio-pci 0000:2d:00.0: reset done
Nov 15 22:45:42 cerro systemd[1]: Created slice qemu.slice - Slice /qemu.
Nov 15 22:45:42 cerro systemd[1]: Started 100.scope.
Nov 15 22:45:43 cerro kernel: tap100i0: entered promiscuous mode
Nov 15 22:45:43 cerro kernel: vmbr0: port 2(fwpr100p0) entered blocking state
Nov 15 22:45:43 cerro kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Nov 15 22:45:43 cerro kernel: fwpr100p0: entered allmulticast mode
Nov 15 22:45:43 cerro kernel: fwpr100p0: entered promiscuous mode
Nov 15 22:45:43 cerro kernel: vmbr0: port 2(fwpr100p0) entered blocking state
Nov 15 22:45:43 cerro kernel: vmbr0: port 2(fwpr100p0) entered forwarding state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Nov 15 22:45:43 cerro kernel: fwln100i0: entered allmulticast mode
Nov 15 22:45:43 cerro kernel: fwln100i0: entered promiscuous mode
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 1(fwln100i0) entered forwarding state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Nov 15 22:45:43 cerro kernel: tap100i0: entered allmulticast mode
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Nov 15 22:45:43 cerro kernel: fwbr100i0: port 2(tap100i0) entered forwarding state
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: enabling device (0002 -> 0003)
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: resetting
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: reset done
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.1: enabling device (0000 -> 0002)
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: resetting
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.1: resetting
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: reset done
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.1: reset done
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: resetting
Nov 15 22:45:44 cerro kernel: vfio-pci 0000:2d:00.0: reset done
Nov 15 22:45:44 cerro qm[2075]: VM 100 started with PID 2110.
Nov 15 22:45:44 cerro qm[2074]: <root@pam> end task UPID:cerro:0000081B:00003E34:69190296:qmstart:100:root@pam: OK
 
I've made some progress: I have flashed the Video Bios in the old AMD card so it can now be used in a UEFI environment and motherboard now starts in said mode, so I can now start the VM with no errors.
However, when I started the VM I could briefly see the Proxmox splash "Start boot option" screen and then the screen went to back and shut off. In order to fix that, I had to add the usb mapping for mouse and keyboard. Nonetheless, I have to move to the "Console" of the VM and log in (I can do that "blindly" using the keyboard) to see the Debian VM in the external monitor using the GPU. Screen movement in the screen is VERY slow but I am using a KVM switch so, that could be a factor
So, all in all, it works now and it is just a question of fixing those little issues I mentioned above.
Thanks for reading!