XFX Radeon RX 590 GPU passthrough to Ubuntu VM.

bwilling

New Member
Mar 1, 2023
7
0
1
Hi All,

First time poster here looking for some assistance. Please be gentle :)

Okay, so before I go bald trying to solve this issue. I thought I would ask this helpful community for some advice.
For some time now, I have been trying to get GPU passthrough working for my ubuntu 22.04 (6.2.0-32-generic) VM. Reason being, that I have a Jellyfin server running for which I would like to enable HW transcoding.

Following some very helpful guides online, I had managed to get this working for a time. However, after having updated the ubuntu VM to the newest kernel. I was presented with the error message:

pci_hp_register failed with error - 16

Of course, as you often do, I immediately took to google in search for an answer. I stumbled upon some threads on here which mentioned that disabling ACPI Support for the VM would solve the issue. Having tried this though, it seems that when this is enabled the VM doesn't detect the GPU at all.

So of course you will be wondering what my setup looks like:

PVE version 8 (only recently updated as part of troubleshooting, was on pve7.4 before)

Contents of /etc/default/grub:
GRUB_DEFAULT=0 GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=vesa:off vfio-pci.ids=1002:67df,1002:aaf0 kvm.ignore_msrs=1 initcall_blacklist=sysfb_init" GRUB_CMDLINE_LINUX=""

Output of dmesg | grep -E "DMAR|IOMMU"
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA [ 0.008870] ACPI: DMAR 0x00000000BB8051A0 0000A8 (v01 A M I OEMDMAR 00000001 INTL 00000001) [ 0.008893] ACPI: Reserving DMAR table memory at [mem 0xbb8051a0-0xbb805247] [ 0.125064] DMAR: IOMMU enabled [ 0.339221] DMAR: Host address width 46 [ 0.339223] DMAR: DRHD base: 0x000000ef844000 flags: 0x1 [ 0.339231] DMAR: dmar0: reg_base_addr ef844000 ver 1:0 cap d2078c106f0462 ecap f020ff [ 0.339234] DMAR: RMRR base: 0x000000bb6c1000 end: 0x000000bb6ecfff [ 0.339236] DMAR: ATSR flags: 0x0 [ 0.339239] DMAR-IR: IOAPIC id 0 under DRHD base 0xef844000 IOMMU 0 [ 0.339241] DMAR-IR: IOAPIC id 2 under DRHD base 0xef844000 IOMMU 0 [ 0.339243] DMAR-IR: HPET id 0 under DRHD base 0xef844000 [ 0.339244] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping. [ 0.339625] DMAR-IR: Enabled IRQ remapping in x2apic mode [ 1.273386] DMAR: [Firmware Bug]: RMRR entry for device 0b:00.0 is broken - applying workaround [ 1.273403] DMAR: No SATC found [ 1.273405] DMAR: dmar0: Using Queued invalidation [ 1.276184] DMAR: Intel(R) Virtualization Technology for Directed I/O [ 68.980977] DMAR: VT-d detected Invalidation Time-out Error: SID 0 [ 68.980984] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x2500000014, qw1 = 0x0 [ 68.981030] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005219c [ 68.981048] DMAR: Invalidation Time-out Error (ITE) cleared [ 68.981051] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 68.981052] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x2600000014, qw1 = 0x0 [ 68.981083] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000521a4 [ 68.981100] DMAR: Invalidation Completion Error (ICE) cleared [ 69.337534] DMAR: VT-d detected Invalidation Time-out Error: SID 0 [ 69.337543] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x3500000014, qw1 = 0x0 [ 69.337592] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000522f4 [ 69.337610] DMAR: Invalidation Time-out Error (ITE) cleared [ 69.337613] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 69.337614] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x3600000014, qw1 = 0x0 [ 69.337646] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000522fc [ 69.337662] DMAR: Invalidation Completion Error (ICE) cleared [ 69.617155] DMAR: VT-d detected Invalidation Time-out Error: SID 0 [ 69.617163] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x1a00000014, qw1 = 0x0 [ 69.617180] DMAR: DRHD: handling fault status reg 60 [ 69.617194] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005237c [ 69.617244] DMAR: Invalidation Time-out Error (ITE) cleared [ 69.617246] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 69.617247] DMAR: QI HEAD: Context-cache Invalidation qw0 = 0x40000010031, qw1 = 0x0 [ 69.617277] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x100052384 [ 69.617294] DMAR: Invalidation Completion Error (ICE) cleared [ 69.619105] DMAR: VT-d detected Invalidation Time-out Error: SID 0 [ 69.619108] DMAR: QI HEAD: Context-cache Invalidation qw0 = 0x40000010031, qw1 = 0x0 [ 69.619145] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x100052394 [ 69.619162] DMAR: Invalidation Time-out Error (ITE) cleared [ 69.619165] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 69.619166] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x100e2, qw1 = 0x0 [ 69.619195] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005239c [ 69.619211] DMAR: Invalidation Completion Error (ICE) cleared [ 69.621004] DMAR: VT-d detected Invalidation Time-out Error: SID 0 [ 69.621008] DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x40000000003, qw1 = 0x0 [ 69.621046] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000523ac [ 69.621064] DMAR: Invalidation Time-out Error (ITE) cleared [ 69.621066] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 69.621067] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x200f2, qw1 = 0x1000 [ 69.621096] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000523b4 [ 69.621112] DMAR: Invalidation Completion Error (ICE) cleared [ 72.233929] DMAR: VT-d detected Invalidation Time-out Error: SID 0 [ 72.233936] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x200e2, qw1 = 0x0 [ 72.233944] DMAR: DRHD: handling fault status reg 60 [ 72.233968] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x100052044 [ 72.234014] DMAR: Invalidation Time-out Error (ITE) cleared [ 72.234017] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 72.234018] DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x40000000003, qw1 = 0x3ffff001 [ 72.234049] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005204c [ 72.234065] DMAR: Invalidation Completion Error (ICE) cleared [ 78.381120] DMAR: VT-d detected Invalidation Time-out Error: SID 0 [ 78.381122] DMAR: DRHD: handling fault status reg 60 [ 78.381126] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 78.381127] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x1400000014, qw1 = 0x0 [ 78.381202] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005209c [ 78.381220] DMAR: Invalidation Time-out Error (ITE) cleared [ 78.381223] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 78.381224] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x1400000014, qw1 = 0x0 [ 78.381255] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000520a4 [ 78.381271] DMAR: Invalidation Completion Error (ICE) cleared [ 78.875845] DMAR: VT-d detected Invalidation Time-out Error: SID 0 [ 78.875848] DMAR: DRHD: handling fault status reg 60 [ 78.875854] DMAR: DRHD: handling fault status reg 60 [ 78.875880] DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x40000000003, qw1 = 0xffff001 [ 78.875925] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000520e4 [ 78.875943] DMAR: Invalidation Time-out Error (ITE) cleared [ 78.875946] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 78.875947] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x200e2, qw1 = 0x0 [ 78.875975] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000520ec [ 78.875991] DMAR: Invalidation Completion Error (ICE) cleared [ 78.875999] DMAR: VT-d detected Invalidation Time-out Error: SID 0 [ 78.876000] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x200e2, qw1 = 0x0 [ 78.876000] DMAR: DRHD: handling fault status reg 40 [ 78.876005] DMAR: DRHD: handling fault status reg 60 [ 78.876014] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000520ec [ 78.876067] DMAR: Invalidation Time-out Error (ITE) cleared [ 78.876069] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 78.876070] DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x40000000003, qw1 = 0x3ffff001 [ 78.876100] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000520f4 [ 78.876117] DMAR: Invalidation Completion Error (ICE) cleared [ 78.884622] DMAR: VT-d detected Invalidation Time-out Error: SID 0 [ 78.884627] DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x40000000003, qw1 = 0xef001 [ 78.884645] DMAR: DRHD: handling fault status reg 60 [ 78.884653] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x100052124 [ 78.884702] DMAR: Invalidation Time-out Error (ITE) cleared [ 78.884705] DMAR: VT-d detected Invalidation Completion Error: SID 0 [ 78.884706] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x200f2, qw1 = 0x100008 [ 78.884734] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005212c [ 78.885714] DMAR: Invalidation Completion Error (ICE) cleared

root@pve:~# dmesg | grep -i vfio [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=vesa:off vfio-pci.ids=1002:67df,1002:aaf0 kvm.ignore_msrs=1 initcall_blacklist=sysfb_init [ 0.124977] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=vesa:off vfio-pci.ids=1002:67df,1002:aaf0 kvm.ignore_msrs=1 initcall_blacklist=sysfb_init [ 14.557923] VFIO - User Level meta-driver version: 0.3 [ 14.571592] vfio-pci 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none [ 14.571842] vfio_pci: add [1002:67df[ffffffff:ffffffff]] class 0x000000/00000000 [ 14.596201] vfio_pci: add [1002:aaf0[ffffffff:ffffffff]] class 0x000000/00000000 [ 21.047516] vfio-pci 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=io+mem:owns=none [ 21.050760] vfio-pci 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none [ 63.422566] vfio-pci 0000:04:00.0: AMD_POLARIS10: version 1.1 [ 63.422574] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing pre-reset [ 63.434447] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing reset [ 63.434455] vfio-pci 0000:04:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x28e4 [ 63.434458] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing post-reset [ 63.458469] vfio-pci 0000:04:00.0: AMD_POLARIS10: reset result = 0 [ 68.516604] vfio-pci 0000:04:00.0: enabling device (0400 -> 0403) [ 68.516756] vfio-pci 0000:04:00.0: AMD_POLARIS10: version 1.1 [ 68.516760] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing pre-reset [ 68.528986] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing reset [ 68.528994] vfio-pci 0000:04:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x2a3c [ 68.528997] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing post-reset [ 68.553289] vfio-pci 0000:04:00.0: AMD_POLARIS10: reset result = 0 [ 68.557765] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x19@0x270 [ 68.558008] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0 [ 68.558182] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x1e@0x370 [ 68.606361] vfio-pci 0000:04:00.1: enabling device (0100 -> 0102) [ 68.742390] vfio-pci 0000:04:00.0: AMD_POLARIS10: version 1.1 [ 68.742399] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing pre-reset [ 68.742554] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing reset [ 68.742560] vfio-pci 0000:04:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x2a40 [ 68.742562] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing post-reset [ 68.766441] vfio-pci 0000:04:00.0: AMD_POLARIS10: reset result = 0 [ 1925.654093] vfio-pci 0000:04:00.0: AMD_POLARIS10: version 1.1 [ 1925.654102] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing pre-reset [ 1925.654274] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing reset [ 1925.654280] vfio-pci 0000:04:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x28dc [ 1925.654282] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing post-reset [ 1925.678492] vfio-pci 0000:04:00.0: AMD_POLARIS10: reset result = 0

/etc/modprobe.d/vfio.conf
# Make vfio-pci a pre-dependency of the usual video modules softdep amdgpu pre: vfio-pci softdep radeon pre: vfio-pci # Have vfio-pci grab the XFX 580 device IDs on boot options vfio-pci ids=1002:67df,1002:aaf0 disable_vga=1 options vfio-pci ids=1002:67df,1002:aaf0 softdep radeon pre: vfio-pci softdep amdgpu pre: vfio-pci softdep nouveau pre: vfio-pci softdep nvidia pre: vfio-pci softdep nvidiafb pre: vfio-pci softdep nvidia_drm pre: vfio-pci softdep drm pre: vfio-pci softdep snd_hda_intel pre: vfio-pci softdep snd_hda_codec_hdmi pre: vfio-pci softdep i915 pre: vfio-pci softdep snd_hda_codec_hdmi pre: vfio-pci

/etc/modprobe.d/blacklist.conf
blacklist radeon blacklist amdgpu blacklist snd_hda_intel

root@pve:~# lshw -c display *-display description: VGA compatible controller product: GK106 [GeForce GTX 660] vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:05:00.0 logical name: /dev/fb0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller bus_master cap_list rom fb configuration: depth=32 driver=nouveau latency=0 resolution=2560,1440 resources: irq:58 memory:ee000000-eeffffff memory:d8000000-dfffffff memory:e0000000-e1ffffff ioport:a000(size=128) memory:c0000-dffff *-display description: VGA compatible controller product: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] vendor: Advanced Micro Devices, Inc. [AMD/ATI] physical id: 0 bus info: pci@0000:04:00.0 version: e1 width: 64 bits clock: 33MHz capabilities: pm pciexpress msi vga_controller cap_list rom configuration: driver=vfio-pci latency=0 resources: irq:28 memory:c0000000-cfffffff memory:d0000000-d01fffff ioport:b000(size=256) memory:ef700000-ef73ffff memory:ef740000-ef75ffff

From what i can tell, this seems to be the consensus for this type of setup online.

The VM is setup as follows:
1694653077369.png
1694653111619.png

Curiously, when the VM is running I loose the ability to run lshw and lspci, they seem to freeze the ssh and vnc console sessions.
Contents of /dev/dri:
card0 card1 render128

I am fairly new to this sys admin stuff, so any and all help would be appreciated!
If i have dumb questions following your probably very smart answers, that is because I am in fact dumb :rolleyes:
 
Last edited:
What do your IOMMU groups look like when you don't use pcie_acs_override=downstream,multifunction (which breaks secure isolation of VM)? Can you move the RX590 to a PCIe slot so that it is a IOMMU group of its own? You probably need install and activate vendor-reset to make an RX590 reset properly for PCI(e) passthrough. I hope this gets you started.
 
What do your IOMMU groups look like when you don't use pcie_acs_override=downstream,multifunction (which breaks secure isolation of VM)? Can you move the RX590 to a PCIe slot so that it is a IOMMU group of its own? You probably need install and activate vendor-reset to make an RX590 reset properly for PCI(e) passthrough. I hope this gets you started.
Thanks for the quick response on this thread.

I have already installed the vendor-reset package and it is working as intended as shown by the output of dmesg | grep -i vfio .

I will check later on today what the iommu groups look like with pcie_acs_override option. Will get back to this thread with the findings.
 
After having taken out pci_acs_override option iommu groups persist the same. I have removed the option and won't be adding it back.

added blacklist efifb to /etc/modprobe.d/blacklist.conf
added video=efifb:eek:ff to /etc/default/grub

vm boots with same error, now there's more output
VM boot get's stuck on:
A start job is running for Detect the available GPUs and deal with any system changes
A start job is running for initialize hardware monitoring sensors.

I've attached the full output of the dmesg command.

The lshw and lspci commands also get stuck, hence why i attached the output of cat /proc/$pid/stack , where $pid is the pid of either lshw or lspci. Maybe someone can tell me if they're seeing anything that would suggest these are getting stuck. I surely don't :))
 

Attachments

  • cat proc-PID-stack.txt
    1.2 KB · Views: 2
  • dmesg_grep(0000.04.00).txt
    6.9 KB · Views: 1
  • dmesg_output.txt
    96.1 KB · Views: 2
Last edited:
After having taken out pci_acs_override option iommu groups persist the same. I have removed the option and won't be adding it back.
Please show the IOMMU groups using for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done.
added blacklist efifb to /etc/modprobe.d/blacklist.conf
added video=efifb:eek:ff to /etc/default/grub
There is not point in using video=vesafb:off and/or efifb:off since they are not used by Proxmox. Proxmox only needs initcall_blacklist=sysfb_init when the passthrough GPU is used during the boot of the system. What is the output of cat /proc/cmdline?
vm boots with same error, now there's more output
VM boot get's stuck on:
A start job is running for Detect the available GPUs and deal with any system changes
A start job is running for initialize hardware monitoring sensors.

I've attached the full output of the dmesg command.

The lshw and lspci commands also get stuck, hence why i attached the output of cat /proc/$pid/stack , where $pid is the pid of either lshw or lspci. Maybe someone can tell me if they're seeing anything that would suggest these are getting stuck. I surely don't :))
I don't see the expected vendor-reset messages in dmesg. Do you activate it for the RX590 after each reboot of Proxmox?
It looks like amdgpu is loading at boot, which can work but it might be easier to blacklist it (or early bind the GPU to vfio-pci with a softdep to make sure vfio-pci loads before amdgpu).
Can you share the VM configuration file (or qm config ?
 
Please show the IOMMU groups using for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done.
Please see attached for the output.
There is not point in using video=vesafb:off and/or efifb:off since they are not used by Proxmox. Proxmox only needs initcall_blacklist=sysfb_init when the passthrough GPU is used during the boot of the system. What is the output of cat /proc/cmdline?

BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt video=efifb:off video=vesa:off vfio-pci.ids=1002:67df,1002:aaf0 kvm.ignore_msrs=1 initcall_blacklist=sysfb_init


root@pve:~# qm config 400 not enough arguments qm config <vmid> [OPTIONS] root@pve:~# qm config 102 acpi: 1 agent: 0 boot: order=scsi0;net0 cores: 4 cpu: host hostpci0: 0000:04:00,pcie=1 kvm: 1 machine: q35 memory: 16384 meta: creation-qemu=7.1.0,ctime=1677333132 name: Ubuntu net0: virtio=F2:89:6C:BA:3F:D6,bridge=vmbr0,firewall=1 numa: 0 ostype: l26 scsi0: local-lvm:vm-102-disk-0,iothread=1,size=160G,ssd=1 scsihw: virtio-scsi-single smbios1: uuid=24f1102d-152c-4974-9ade-a34227450d8a sockets: 1 vmgenid: 5f6cc693-4349-482f-bbc4-6418fa8a1a02
 

Attachments

  • dmesg+grep-vfio.txt
    5.2 KB · Views: 3
  • iommu-groups.txt
    11.6 KB · Views: 2
Please see attached for the output.


BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt video=efifb:off video=vesa:off vfio-pci.ids=1002:67df,1002:aaf0 kvm.ignore_msrs=1 initcall_blacklist=sysfb_init
video=efifb:off video=vesa:off does nothing and can be removed.
acpi: 1 agent: 0 boot: order=scsi0;net0 cores: 4 cpu: host hostpci0: 0000:04:00,pcie=1 kvm: 1 machine: q35 memory: 16384 meta: creation-qemu=7.1.0,ctime=1677333132 name: Ubuntu net0: virtio=F2:89:6C:BA:3F:D6,bridge=vmbr0,firewall=1 numa: 0 ostype: l26 scsi0: local-lvm:vm-102-disk-0,iothread=1,size=160G,ssd=1 scsihw: virtio-scsi-single smbios1: uuid=24f1102d-152c-4974-9ade-a34227450d8a sockets: 1 vmgenid: 5f6cc693-4349-482f-bbc4-6418fa8a1a02
IOMMU groups of the RX590 looks good; you don't need pcie_acs_override. vendor-reset is activated. I don't see any problems, but you also filtered the log. Does it work now?
 
I have removed video=efifb:off video=vesa:off updated grub, rebooted
still no joy,

error message persists in the VM on start up.

lshw and lspci commands still freeze ssh.

can't login through the console either anymore as it freezes once i enter my password.
 
FWIW , the message also shows on my truenas vm where I’ve passed through my HBA

Although on this VM it’s not really presenting any major issues
The HBA still shows in the vm and is 100% usable
 
Just a quick update on this.

I have had some time to troubleshoot this over the weekend and i've learned a couple of things.
Firstly, the issue with the GPU wasn't caused by the pci_hp_register issue.

I tested with a newer generation nvidia card (oem rtx 2060), which worked fine.
This led me to uninstall the amdgpu graphics drivers within the VM. Without which the VM booted fine.

I reinstalled using the newer amdgpu drivers for ubuntu 22.04.3, still the same issues i.e vm would freeze whenever i had attempted to login to it via the GUI. lspci and lshw commands would still freeze.

What i did to get around this was blacklist the amdgpu drivers inside the vm, then sign into the VM normally, once signed in I would: sudo modprobe amdgpu and lo and behold everything seemed to work okay.
I'm going to be running it like this for the next couple of weeks and potentially replace the GPU with an nvidia card later down the road.

Thanks again for the help!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!