[SOLVED] GPU passthrough - Guest won't boot and/or crashes host

lord_emperor

New Member
Dec 4, 2019
2
0
1
54
System specs, everybody loves spec:
Gigabyte B450M-DS3H
Ryzen 2400G
Team 2x8GB DDR4 3000
Syba 2-port PCIE SATA card (nothing connected)
HP 4-port Intel E1000 SATA NIC
500GB SATA SSD (Proxmox / boot)
4TB SATA HDD (directly attached to Ubuntu guest)
2TB SATA HDD (directly attached to Ubuntu guest)
500GB USB3 external HDD (setup as vzdump)
Yeston RX550 4GB low profile / single slot

Software:
Proxmox VE 6.1
VM100: Ubuntu server guest (working great!)
VM101: Windows server (not in use / stopped)
VM102: Windows 10 guest with passthrough (the problem)
VM103: Windows 10 guest with no passthrough (working great!)

Problem description:

Initially I was running Proxmox 6.0-4 and when I GPU arrived I installed it and just tried attaching it to an existing guest through the GUI. The guest stopped being able to boot up and this even caused the host to crash/freeze and require a reset.

I then created a new VM and carefully worked through all the steps here, here and here. Passthrough actually seems to be working because I can complete a Windows installation and even boot into the OS using HDMI out from the GPU, however as soon as I enable network connectivity and Windows update installs the AMD GPU driver the system locks up. Starting the guest after GPU drivers are installed may freeze the host and require a reset or the guest can't boot and stops at this (always shows the spinning "stars" but only sometimes shows the EFI boot messages).

yOFnMSN.jpg


Since new versions are great, and in another post the OP resolved a similar issue by upgrading to 6.0-7, I went ahead and updated to Proxmox 6.1 but this didn't change anything.

I'd be happy to post logs I just don't know which ones are relevant to the problem.

Code/config dump:

# cat /etc/modprobe.d/blacklist.conf
blacklist radeon

# lspci -v | grep 01:00
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Polaris11] (rev ff) (prog-if 00 [VGA controller])
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X]

# lspci -n -s 01:00
01:00.0 0300: 1002:67ff (rev ff)
01:00.1 0403: 1002:aae0

# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:67ff,1002:aae0

# cat /etc/pve/nodes/server00/qemu-server/102.conf
args: -machine type=q35,kernel_irqchip=on
bios: ovmf
boot: c
bootdisk: sata0
cores: 4
efidisk0: local-lvm:vm-102-disk-1,size=128K
hostpci0: 01:00,x-vga=1,romfile=yeston-rx550-4G.bin
machine: q35
memory: 4096
name: vm02
net0: virtio=ma:ca:dd:re:ss:es,bridge=vmbr2,firewall=1
numa: 0
ostype: l26
sata0: local-lvm:vm-102-disk-0,size=64G
sata1: local:iso/Windows10_1903.iso,media=cdrom,size=4000576K
sata2: local:iso/virtio-win-0.1.171.iso,media=cdrom,size=363020K
scsihw: virtio-scsi-pci
smbios1: uuid=should-this-be-private?
sockets: 1
usb0: host=046d:c52b
vga: none
vmgenid: should-this-be-private?

# cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

# ls /sys/firmware/efi/
config_table efivars esrt fw_platform_size fw_vendor runtime runtime-map systab vars

$ ./rom-parser yeston-rx550-4G.bin
Valid ROM signature found @0h, PCIR offset 248h
PCIR: type 0 (x86 PC-AT), vendor: 1002, device: 67ff, class: 030000
PCIR: revision 0, vendor revision: f32
Valid ROM signature found @e800h, PCIR offset 1ch
PCIR: type 3 (EFI), vendor: 1002, device: 67ff, class: 030000
PCIR: revision 0, vendor revision: 0
EFI: Signature Valid, Subsystem: Boot, Machine: X64
Last image
 
Last edited:
blacklist radeon

i guess you also have to blacklist 'amdgpu'
check if that is the driver in use with 'lspci -k'

args: -machine type=q35,kernel_irqchip=on
on a current pve this should not be necessary as qemu has the correct defaults with qemu >= 4.0.1 (pve 6.1 has 4.1.1)

also check your iommu groups to see if that device is the only one in the group

else, check dmesg and/or syslog of the host and the guest
 
Solved!

As I was picking through my configs again to add information to my post, I noticed that the GUI doesn't add and the GPU passthrough sections of the Wiki don't suggest to add pcie=1 to the config, but the general PCIE passthrough does. I thought I'd try it and it worked.

hostpci0: 01:00,pcie=1,x-vga=1,romfile=yeston-rx550-4G.bin

Capture.PNG
 
great, i did not see the missing pcie=1 ...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!