Yet another iGPU passthrough failure story (R5 5650GE)

koruyucu

New Member
Jan 24, 2023
3
0
1
Good time of the day,

I am a happy owner of the Ryzen 5 Pro 5650GE. However I could be a little more happier if only I've managed to passthrough the builtin iGpu to the ubuntu guest.

Here is my setup:
* cpu: Ryzen 5 Pro 5650GE
* motherboard: Asrock B550M Phantom Gaming 4
* host OS: Proxmox 7.3 (kernel 5.15.83-1-pve)
* guest OS: Xubuntu 22.10 (kernel 5.19.0-29-generic)

I think have all prerequisites for the successful iGpu passthrough:
* enabled IOMMU, SRIOV, disabled CMS
* installed guest OS over noVNC with no Gpu
* enabled iommu and disabled framebuffer (not only) in the grub configuration
Code:
# cat /etc/default/grub | grep _DEFAULT
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt amd_iommu=on initcall_blacklist=sysfb_init video=simblefb:off video=vesafb:off video=efifb:off nofb nomodeset disable_vga=1 textonly pcie_acs_override=downstream,multifunction vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 modprobe.blacklist=amdgpu,snd_hda_intel"

* verified that iommu is enabled
Code:
# dmesg | grep -e DMAR -e IOMMU
[    0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[    0.311711] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.312299] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.389444] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

* and blacklisted drivers again, just in case:
Code:
# cat /etc/modprobe.d/blacklist.conf
blacklist nvidiafb
blacklist nouveau
blacklist nvidia
blacklist radeon
blacklist snd_hda_intel

* and configured vfio driver for required devices
Code:
# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:1637,1002:1638,1022:15e3 disable_vga=1

* and verified iommu groups and that devices are picked by vfio:
Code:
# lspci -nnv
...
05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [1002:1638] (rev dc) (prog-if 00 [VGA controller])
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1636]
        Flags: fast devsel, IRQ 29, IOMMU group 14
        ...
        Kernel driver in use: vfio-pci
        Kernel modules: amdgpu
...  
05:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
        Flags: fast devsel, IRQ 255, IOMMU group 15
        ...
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel
...
  05:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]
        Subsystem: ASRock Incorporation Family 17h (Models 10h-1fh) HD Audio Controller [1849:d887]
        Flags: fast devsel, IRQ 255, IOMMU group 19
        ...
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel
...

* and verified that no sneaky framebuffer is locking any memory:
Code:
# cat /proc/iomem | grep BOOTFB | wc -l
0

* and have snatched vbios:
Code:
# sudo echo 1 > /sys/bus/pci/devices/0000\:05\:00.0/rom
...
# file /usr/share/kvm/5650ge.rom
/usr/share/kvm/5650ge.rom: BIOS (ia32) ROM Ext. IBM comp. Video (108*512)
...
# ./rom-parser /usr/share/kvm/5650ge.rom
Valid ROM signature found @0h, PCIR offset 1b0h
      PCIR: type 0 (x86 PC-AT), vendor: 1002, device: 1638, class: 030000
      PCIR: revision 0, vendor revision: 110a
      Last image

* and configured VM to take control over the iGpu:
Code:
# cat /etc/pve/qemu-server/101.conf
balloon: 0
bios: ovmf
# also tried host cpu
cpu: kvm64,flags=+pdpe1gb;+aes
hostpci0: 0000:05:00.0,x-vga=on,romfile=5650ge.rom
machine: q35
...

* all these was interspersed with lots of `update-grub`, `update-initramfs` and `reboot now`

All I have after all these hassle after booting the VM is blackscreen of misery, noVNC console available, and host log with following entries:
Code:
...
[84799.160173] kvm [195091]: ignored rdmsr: 0xc001100d data 0x0
[84799.160182] kvm [195091]: ignored wrmsr: 0xc001100d data 0x0
[84799.304296] kvm [195091]: ignored rdmsr: 0xc001100d data 0x0
[84799.304301] kvm [195091]: ignored wrmsr: 0xc001100d data 0x0
[84799.383530] kvm [195091]: ignored rdmsr: 0xc001100d data 0x0
[84799.383533] kvm [195091]: ignored wrmsr: 0xc001100d data 0x0
[84799.463542] kvm [195091]: ignored rdmsr: 0xc001100d data 0x0
[84799.463546] kvm [195091]: ignored wrmsr: 0xc001100d data 0x0
[84800.417381] kvm [195091]: ignored rdmsr: 0x122 data 0x0
[84800.417389] kvm [195091]: ignored rdmsr: 0x10f data 0x0
...

However before attaching the iGpu I have enabled Ssh on the guest and here is what I have:
* guest does see the iGpu:
Code:
 $ lspci -nnk
 06:10.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [1002:1638] (rev dc)
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [1002:1636]
        Kernel modules: amdgpu

* guest has some issues loading the amdgpu driver:
Code:
  [0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.19.0-29-generic ... ro quiet splash vt.handoff=7
  ...
  [0.060917] Booting paravirtualized kernel on bare hardware
  ...
  [0.317901] smpboot: CPU0: AMD Common KVM processor (family: 0xf, model: 0x6, stepping: 0x1)
  ...
  [0.769586] pci 0000:06:10.0: [1002:1638] type 00 class 0x030000
  [0.775478] pci 0000:06:10.0: reg 0x10: [mem 0x800000000-0x80fffffff 64bit pref]
  [0.783477] pci 0000:06:10.0: reg 0x18: [mem 0x810000000-0x8101fffff 64bit pref]
  [0.791477] pci 0000:06:10.0: reg 0x20: [io  0x9000-0x90ff]
  [0.795476] pci 0000:06:10.0: reg 0x24: [mem 0xc0600000-0xc067ffff]
  [0.803477] pci 0000:06:10.0: reg 0x30: [mem 0xffff0000-0xffffffff pref]
  ...
  [0.872318] iommu: Default domain type: Translated
  ...
  [0.931572] pci 0000:00:1a.0: can't claim BAR 4 [io  0xd300-0xd31f]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
  [0.931579] pci 0000:00:1a.1: can't claim BAR 4 [io  0xd2e0-0xd2ff]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
  [0.931584] pci 0000:00:1a.2: can't claim BAR 4 [io  0xd2c0-0xd2df]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
  [0.931609] pci 0000:00:1d.0: can't claim BAR 4 [io  0xd2a0-0xd2bf]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
  [0.931614] pci 0000:00:1d.1: can't claim BAR 4 [io  0xd280-0xd29f]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
  [0.931618] pci 0000:00:1d.2: can't claim BAR 4 [io  0xd260-0xd27f]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
  [0.931670] pci 0000:00:1f.2: can't claim BAR 4 [io  0xd240-0xd25f]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
  [0.931675] pci 0000:00:1f.3: can't claim BAR 4 [io  0xd200-0xd23f]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
  ...
  [0.931786] pci 0000:06:10.0: vgaarb: setting as boot VGA device
  [0.931786] pci 0000:06:10.0: vgaarb: bridge control possible
  [0.931786] pci 0000:06:10.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
  [0.931786] vgaarb: loaded
  ...
  [0.946299] pci 0000:06:10.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]: no compatible bridge window
  [0.946302] pci 0000:06:12.0: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window
  ...
  [0.963872] pci 0000:06:12.0: BAR 6: assigned [mem 0xc06c0000-0xc06fffff pref]
  [0.963875] pci 0000:06:10.0: BAR 6: assigned [mem 0xc0690000-0xc069ffff pref]
  ...
  [1.294554] VFIO - User Level meta-driver version: 0.3
  [1.417942] Run /init as init process
  ...
  [    2.011999] systemd[1]: Detected virtualization qemu.
  ...
  [    2.517800] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.
  ...
  [    2.622421] snd_hda_intel 0000:00:1b.0: no codecs found!
  ...
  [    3.697776] [drm] amdgpu kernel modesetting enabled.
  [    3.697875] amdgpu: CRAT table not found
  [    3.697878] amdgpu: Virtual CRAT table created for CPU
  [    3.697887] amdgpu: Topology: Add CPU node
  [    3.698546] amdgpu 0000:06:10.0: vgaarb: deactivate vga console
  [    3.698833] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1638 0x1002:0x1636 0xDC).
  [    3.698843] [drm] register mmio base: 0xC0600000
  [    3.698844] [drm] register mmio size: 524288
  [    3.699952] [drm] add ip block number 0 <soc15_common>
  [    3.699954] [drm] add ip block number 1 <gmc_v9_0>
  [    3.699955] [drm] add ip block number 2 <vega10_ih>
  [    3.699956] [drm] add ip block number 3 <psp>
  [    3.699956] [drm] add ip block number 4 <smu>
  [    3.699957] [drm] add ip block number 5 <dm>
  [    3.699958] [drm] add ip block number 6 <gfx_v9_0>
  [    3.699959] [drm] add ip block number 7 <sdma_v4_0>
  [    3.699960] [drm] add ip block number 8 <vcn_v2_0>
  [    3.699961] [drm] add ip block number 9 <jpeg_v2_0>
  [    3.705561] [drm] BIOS signature incorrect 0 0
  [    3.708349] amdgpu 0000:06:10.0: amdgpu: Fetched VBIOS from ROM BAR
  [    3.708357] amdgpu: ATOM BIOS: 13-CEZANNE-019
  [    3.710161] [drm] VCN decode is enabled in VM mode
  [    3.710163] [drm] VCN encode is enabled in VM mode
  [    3.710163] [drm] JPEG decode is enabled in VM mode
  [    3.710165] amdgpu 0000:06:10.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
  [    3.710166] amdgpu 0000:06:10.0: amdgpu: PCIE atomic ops is not supported
  [    3.710173] amdgpu 0000:06:10.0: amdgpu: MODE2 reset
  [    6.533158] amdgpu 0000:06:10.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000002
  [    6.533165] amdgpu 0000:06:10.0: amdgpu: Mode2 reset failed!
  [    6.533168] amdgpu 0000:06:10.0: amdgpu: asic reset on init failed
  [    6.533171] amdgpu 0000:06:10.0: amdgpu: Fatal error during GPU init
  [    6.533193] amdgpu 0000:06:10.0: amdgpu: amdgpu: finishing device.
  [    6.533864] amdgpu: probe of 0000:06:10.0 failed with error -62

As I am completely noob in this realm I sincerely hope that there is some trivial bit I've missed and all problems can be solved by changing some magic grub/kernel/eufi setting but deep inside my heart there is a doubt already that this is some scary driver/kernel/hardware bug that maybe only time can heal. Can someone help me sort this out?

Thank you
 
I have been there, for a week I have been reading all of the internet to figure it out. Fortunately i was able to.
https://forum.proxmox.com/threads/c...ough-to-ws2019-guest-on-hp-ml110-gen9.121141/
Windows 10 though. Try to go through my config and the latest post of the config,specificaly args line.
If you can i would suggest to try with some cheapest PCI GPU, as you can download the ROM file from the link of my post that helped me.
As the exported rom didn't worked for me.
Btw i see that kernel in use is vfio nevertheless have you tried blacklist amdgpu? (I am 99% wrong on this, but just in case).
https://askubuntu.com/questions/1080217/how-to-blacklist-amdgpu-driver
Try maybe some other distros?
 
Last edited:
try turn off resizable bar in bios.

also your GRUB_CMDLINE is a nightmare, lol

you have
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt amd_iommu=on initcall_blacklist=sysfb_init video=simblefb:off video=vesafb:off video=efifb:off nofb nomodeset disable_vga=1 textonly pcie_acs_override=downstream,multifunction vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 modprobe.blacklist=amdgpu,snd_hda_intel"

Try simplifying, as such:
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt amd_iommu=on nofb nomodeset initcall_blacklist=sysfb_init"
 
Thank you all for the fast replies,

I have checked and "Re-Size BAR Support" is disabled in bios.

As for the grub config - I promise to clean it up, just you know, it was kind a "last resort" to lump everything I've found on the net hoping something will make it work. I am aware that blacklisting "sysfb_init" renders "video" directives useless and other options are in configuration files anyway.

However, IIRC, adding "nomodeset" changed the error in the guest.

About buying Gpu - that is not fun at all =) isn't it kind a sport here - find a problem (or create one) and try to solve it?

About blacklisting amdgpu - host is not gonna use it anyway, but not blacklisting can cause driver to be loaded and touch iGpu in someway. But I have tried with and without blacklisting and both did not work.

About different distros - not going to use windows and as for *nix-es - it all comes down to firmware/kernel/driver so I could actually try older versions or 'same version as host' scenario, thanks. But this looks huge to me and I'm not mentally ready.

I've also checked "Can't make Nvidia GPU passthrough to WS2019 guest on HP ML110 Gen9" thread - your scenario seems quite different:
* different Gpu vendor
* different Cpu vendor
* different "technological generation"
* "server grade" platform vs "cheap consumer platform" in my case
* regular Gpu vs iGpu in my case

However I've managed to get different kernel errors, including loosing ability to start vm from the host:
Code:
# this is error on the host after starting and shutting down the guest
kvm: ../hw/pci/pci.c:1562: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.

Now playing with cpu, machine settings.

Thank you
 
I've saved some progress:

* disabled 'secure boot' in PVE
* grub: nofb nomodeset disable_vga=1 textonly kvm.ignore_msrs=1
* cpu: host,hidden=1
* hostpci0: 0000:05:00.0,pcie=1,romfile=5650ge.rom,x-vga=1
* machine: pc-q35-5.2 (originally), pc-q35-7.1
* vga: none

Now VM is trying to get the control over - grub messages are removed form the screen, but screen is black and there are following entries in the guests log:

Code:
amdgpu 0000:01:00.0: amdgpu: PSP runtime database doesn't exist
amdgpu 0000:01:00.0: amdgpu: PSP runtime database doesn't exist
amdgpu 0000:01:00.0: amdgpu: Will use PSP to load VCN firmware
amdgpu 0000:01:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:01:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:01:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:01:00.0: amdgpu: SMU is initialized successfully!
[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
[drm:parse_hdmi_amd_vsdb [amdgpu]] *ERROR* EDID CEA parser failed
[drm] DSC precompute is not needed.
fbcon: amdgpudrmfb (fb0) is primary device
Console: switching to colour frame buffer device 240x67
amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
loop9: detected capacity change from 0 to 8
[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3

1. the rig is connected to the smart TV; if TV is off when VM is started I dont get EDID related error
2. currently I dont passthrough PSP (05:00.2), only iGpu (05:00.0); do I have to?

Using `drm.vblankoffdelay=0` havent solved 'DMUB idle: status=3' error for me. Does anybody have
tricks to solve this error?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!