VM refuses to boot when GPU is passed through

armandino

Member
Dec 28, 2021
4
0
6
33
I feel like I've tried most PCI passthrough guides now, with dwindling luck. My setup is this:

CPU: AMD Ryzen 7 7700X
Motherboard: ASUS PRIME B650M-A WIFI
GPU: ASUS Dual Radeon RX 7600 V2 OC

After giving up on Windows as a VM for remote gaming and other GPU-heavy stuff, I supposed I'd try my luck on Linux instead, but the problem is currently exactly the same, setup and booting without the GPU passed through works great, but as soon as it boots with PCI passthrough I get this error:
Code:
TASK ERROR: start failed: QEMU exited with code 1

Some potentially relevant outputs, if anyone can give any pointers:

/etc/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci

/proc/cmdline:
Code:
initrd=\EFI\proxmox\6.5.13-3-pve\initrd.img-6.5.13-3-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs

/etc/kernel/cmdline:
Code:
root=ZFS=rpool/ROOT/pve-1 boot=zfs

lspci -nnk (relevant output only):
Code:
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7700S/7600S] [1002:7480] (rev cf)
        Subsystem: ASUSTeK Computer Inc. Navi 33 [Radeon RX 7700S/7600/7600S/7600M XT/PRO W7600] [1043:05fb]
        Kernel driver in use: vfio-pci
        Kernel modules: amdgpu
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

qm config 119:
Code:
boot: order=scsi0;ide2;net0
cores: 7
cpu: x86-64-v2-AES
hostpci0: 0000:03:00.0,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 16000
meta: creation-qemu=9.0.0,ctime=1719906357
name: gaming
net0: virtio=BC:24:11:52:32:F3,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: div:119/vm-119-disk-0.qcow2,iothread=1,size=350G
scsihw: virtio-scsi-single
smbios1: uuid=85afa764-7024-467f-8231-d39ab31248b1
sockets: 2
vmgenid: feaac97c-fab0-4f14-bb12-d1e6dadc5ee7
 
Check the System log (or journalctl) for error messages around the time of starting the VM. If there are no error messages, then the problems is not enough free memory. Try starting the VM with 4096MB. My guess is that your system hs 32GB of memory and ZFS uses up to 16GB and you are trying to give the VM almost 16GB as well. When using PCI(e) passthrough, all VM memory needs to be pinned into actual host RAM because of DMA, which is different from VMs without passthrough.
 
  • Like
Reactions: armandino
Thanks! That seems to have solved the booting issue, and I can now see the GPU in Debian using lspci.

There's no HDMI output from the GPU, however, and no combination of "Primary GPU", "All Functions" and "PCI-Express" in the PCI Device GUI followed by a VM reboot seems to mitigate that. Also tried setting "Display" to "none" with no discernible effect.

Also, watch rocm-smi gives this notice:
Code:
WARNING: No AMD GPUs specified.
 
That might be it. What would be a reasonably powerful GPU with a good track record for virtualization I could swap this one for?
 
I just realized that when rebooting Proxmox the GPU outputs host boot info to the display, the last lines being
Code:
[ OK ] Reached target graphical.target - Graphical Interface.
       Starting systemd-update-utmp-runlevel.service - Record runlevel Change in UTMP...
[ OK ] Finished systemd-update-utmp-runlevel.service - Record runlevel Change in UTMP.
Would this imply that the host driver blacklisting isn't working correctly?

/etc/modprobe.d/blacklist.conf looks like this:
Code:
blacklist amdgpu
blacklist radeon
 
I have similar issues with the following combination:

CPU: AMD Ryzen 9 7950X
Motherboard: ASUS TUF Gaming X670E-PLUS
GPU: Gigabyte Radeon RX 7700 XT Gaming OC

Tried around a lot.

As I understand it the following tells me, that the graphic card is used during boot

Code:
root@protobranch:~# journalctl -r | grep 'setting as boot VGA device'
Aug 04 15:28:01 protobranch kernel: pci 0000:03:00.0: vgaarb: setting as boot VGA device
Aug 04 12:47:25 protobranch kernel: pci 0000:03:00.0: vgaarb: setting as boot VGA device

Sometimes I get on the attached display the following lines

Code:
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA [...]
EFI stub: Measured initrd data into PCR9

Also tried several differen mainboard settings

With my old graphic card (NVIDIA GTX770) it work's without any problems.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!