AMD RX 550 GPU passthrough issues

Kylian

New Member
Jun 4, 2023
1
0
1
France
Hello,

I've installed Proxmox on my server to do VMs anc LXC containers easily. I've also planned to create a Windows 11 VM with GPU passthrough to use the capacities of my dedicated GPU to transcode H264 videos in HEVC. So I followed this guide in a first time and at the end of the installation of Windows 11 I tried to install my GPU drivers and the VM crashes. Not only the VM crashes since the entire Proxmox has crashed. I search on the internet, on a lot of forums threads on the Proxmox forums & on Reddit also but I didn't find any solution.

Here are hardware/BIOS information:
  • Motherboard: Atermitter x79 ATX socket LGA 2011
  • CPU: Intel Xeon E5-2630 v2 2.6 GHz
  • RAM: 32 GB (4x8GB) ECC registered 1333 MHz (quad channel)
  • GPU: Sapphire RX 550 2GB GDDR5 OC
  • BIOS: AMI UEFI BIOS
  • BIOS mode: UEFI only, legacy/CSM totally disabled
  • Intel virtualization technology: enabled
  • Intel VT-d: enabled

Here are some useful configuration information:
  • PVE version:
    Code:
    root@radix:~# pveversion --verbose
    proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
    pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
    pve-kernel-5.15: 7.3-3
    pve-kernel-5.15.102-1-pve: 5.15.102-1
    ceph-fuse: 15.2.17-pve1
    corosync: 3.1.7-pve1
    criu: 3.15-1+pve-1
    glusterfs-client: 9.2-1
    ifupdown2: 3.1.0-1+pmx3
    ksm-control-daemon: 1.4-1
    libjs-extjs: 7.0.0-1
    libknet1: 1.24-pve2
    libproxmox-acme-perl: 1.4.4
    libproxmox-backup-qemu0: 1.3.1-1
    libproxmox-rs-perl: 0.2.1
    libpve-access-control: 7.4-1
    libpve-apiclient-perl: 3.2-1
    libpve-common-perl: 7.3-3
    libpve-guest-common-perl: 4.2-4
    libpve-http-server-perl: 4.2-1
    libpve-rs-perl: 0.7.5
    libpve-storage-perl: 7.4-2
    libspice-server1: 0.14.3-2.1
    lvm2: 2.03.11-2.1
    lxc-pve: 5.0.2-2
    lxcfs: 5.0.3-pve1
    novnc-pve: 1.4.0-1
    proxmox-backup-client: 2.3.3-1
    proxmox-backup-file-restore: 2.3.3-1
    proxmox-kernel-helper: 7.4-1
    proxmox-mail-forward: 0.1.1-1
    proxmox-mini-journalreader: 1.3-1
    proxmox-widget-toolkit: 3.6.3
    pve-cluster: 7.3-3
    pve-container: 4.4-3
    pve-docs: 7.4-2
    pve-edk2-firmware: 3.20221111-1
    pve-firewall: 4.3-1
    pve-firmware: 3.6-4
    pve-ha-manager: 3.6.0
    pve-i18n: 2.11-1
    pve-qemu-kvm: 7.2.0-8
    pve-xtermjs: 4.16.0-1
    qemu-server: 7.4-2
    smartmontools: 7.2-pve3
    spiceterm: 3.2-2
    swtpm: 0.8.0~bpo11+3
    vncterm: 1.7-1
    zfsutils-linux: 2.1.9-pve1
  • Bootloader kernel command line config file: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init"
  • Command dmesg | grep -e DMAR -e IOMMU returned:
    Code:
    root@radix:~# dmesg | grep -e DMAR -e IOMMU
    [    0.015152] ACPI: DMAR 0x00000000BACE4AB8 0000BC (v01 A M I  OEMDMAR  00000001 INTL 00000001)
    [    0.015174] ACPI: Reserving DMAR table memory at [mem 0xbace4ab8-0xbace4b73]
    [    0.079065] DMAR: IOMMU enabled
    [    0.219925] DMAR: Host address width 46
    [    0.219926] DMAR: DRHD base: 0x000000fbffc000 flags: 0x1
    [    0.219932] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020de
    [    0.219935] DMAR: RMRR base: 0x000000bb747000 end: 0x000000bb755fff
    [    0.219937] DMAR: ATSR flags: 0x0
    [    0.219939] DMAR: RHSA base: 0x000000fbffc000 proximity domain: 0x0
    [    0.219942] DMAR-IR: IOAPIC id 0 under DRHD base  0xfbffc000 IOMMU 0
    [    0.219944] DMAR-IR: IOAPIC id 2 under DRHD base  0xfbffc000 IOMMU 0
    [    0.219945] DMAR-IR: HPET id 0 under DRHD base 0xfbffc000
    [    0.219946] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
    [    0.220315] DMAR-IR: Enabled IRQ remapping in x2apic mode
    [    0.600369] DMAR: No SATC found
    [    0.600372] DMAR: dmar0: Using Queued invalidation
    [    0.601811] DMAR: Intel(R) Virtualization Technology for Directed I/O
  • Command dmesg | grep 'remapping' returned:
    Code:
    root@radix:~# dmesg | grep 'remapping'
    [    0.219946] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
    [    0.220315] DMAR-IR: Enabled IRQ remapping in x2apic mode
  • Content of the file /etc/module:
    Code:
    # /etc/modules: kernel modules to load at boot time.#
    # This file contains the names of kernel modules that should be loaded
    # at boot time, one per line. Lines beginning with "#" are ignored.
    vfio
    vfio_iommu_type1
    vfio_pci
    vfio_virqfd
  • Content of the file /etc/modprobe.d/blacklist.conf:
    Code:
    blacklist amdgpublacklist radeon
    blacklist nouveau
    blacklist nvidia
  • Content of the file /etc/modprobe.d/vfio.conf:
    Code:
    options vfio-pci ids=1002:699f,1002:aae0 disable_vga=1
  • If I do lspci -v again:
    Code:
    03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7) (prog-if 00 [VGA controller])        Subsystem: Sapphire Technology Limited Lexa PRO [Radeon RX 550]
            Flags: bus master, fast devsel, latency 0, IRQ 255, IOMMU group 25
            Memory at e0000000 (64-bit, prefetchable) [size=256M]
            Memory at f0000000 (64-bit, prefetchable) [size=2M]
            I/O ports at e000 [disabled] [size=256]
            Memory at fbd00000 (32-bit, non-prefetchable) [size=256K]
            Expansion ROM at fbd40000 [disabled] [size=128K]
            Capabilities: [48] Vendor Specific Information: Len=08 <?>
            Capabilities: [50] Power Management version 3
            Capabilities: [58] Express Legacy Endpoint, MSI 00
            Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
            Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
            Capabilities: [150] Advanced Error Reporting
            Capabilities: [200] Physical Resizable BAR
            Capabilities: [270] Secondary PCI Express
            Capabilities: [2b0] Address Translation Service (ATS)
            Capabilities: [2c0] Page Request Interface (PRI)
            Capabilities: [2d0] Process Address Space ID (PASID)
            Capabilities: [320] Latency Tolerance Reporting
            Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
            Capabilities: [370] L1 PM Substates
            Kernel driver in use: vfio-pci
            Kernel modules: amdgpu
    
    03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X]
            Subsystem: Sapphire Technology Limited Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X]
            Flags: fast devsel, IRQ 255, IOMMU group 25
            Memory at fbd60000 (64-bit, non-prefetchable) [disabled] [size=16K]
            Capabilities: [48] Vendor Specific Information: Len=08 <?>
            Capabilities: [50] Power Management version 3
            Capabilities: [58] Express Legacy Endpoint, MSI 00
            Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
            Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
            Capabilities: [150] Advanced Error Reporting
            Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
            Kernel driver in use: vfio-pci
            Kernel modules: snd_hda_intel

Here is my Windows 11 virtual machine configuration:
Code:
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: order=virtio0;ide2;net0
cores: 6
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-202-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:04:00,pcie=1
ide2: local:iso/ubuntu-23.04-desktop-amd64.iso,media=cdrom,size=4816804K
machine: pc-q35-7.2
memory: 16384
meta: creation-qemu=7.2.0,ctime=1685816875
name: WIN-11
net0: virtio=AE:1A:06:02:97:49,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=70c306af-2628-413f-a244-b6a23053e0e5
sockets: 1
tpmstate0: local-lvm:vm-202-disk-1,size=4M,version=v2.0
virtio0: local-lvm:vm-202-disk-2,iothread=1,size=240G
vmgenid: f35ec065-880d-4d51-a18f-641e419e1a45


What I tried:
  • Install another OS:
    - With Windows 10 it is the same
    - With Ubuntu 23.04, the VM doesn't boot and display errors about PCI devices
  • Remove all my configurations and follow the official Proxmox wiki
  • Install my GPU in another PCIe port
  • Install the GPU in a physical machine on Windows 11 (the GPU is working, AMD drivers installed successfully)

I also provide 2 screenshots of the errors I have encountered respectively on Windows and at Ubuntu boot.

thanks in advance to those who will help me.
 

Attachments

  • image.png
    image.png
    421.2 KB · Views: 14
  • Screenshot_2.png
    Screenshot_2.png
    17.5 KB · Views: 15
almost identical configs as the OP, except "AMD-Vi: Interrupt remapping enabled". Same blacklists, same VFIO, same kernel switches. (proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve))

Same card(s) same issue. RX560 (1002:67ef) works but RX550 (1002:699f) does not. Same physical machine(s) - swap the cards back and forth, RX560 works, RX550 doesn't. (WX-4100 doesn't work, same errors). Try identical server, same issue. tried kernel 5.19.17-2-pve, kernel 6.2.16-4-bpo11-pve, no love.

The VM hangs the (virtual) PCI bus as soon as anything serious (radontop/ffmpeg/etc) is run that accesses the card. (pve itself is fine, but VM is toast however)

To be clear the CARD itself works just fine - Put it in an ubuntu 22 box, same AMD drivers/etc, full HW accel, no issues.

lspci -vvv attached for both cards.
 

Attachments

  • rx550.txt
    4.7 KB · Views: 7
  • rx560.txt
    4.7 KB · Views: 4
Last edited:
might have found something that helps here?

https://www.reddit.com/r/VFIO/comments/rmthjf/rx550_passthrough_in_proxmox/


Code:
[    1.330277] Console: colour VGA+ 80x25
[    1.788974] pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    1.788974] pci 0000:01:04.0: vgaarb: setting as boot VGA device
[    1.788974] pci 0000:01:04.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    1.788974] pci 0000:04:00.0: vgaarb: bridge control possible
[    1.788974] pci 0000:01:04.0: vgaarb: bridge control possible
[    1.788974] vgaarb: loaded
[   37.483110] vfio-pci 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none

Looks like the kernel is trying to grab 04:00.0 - (the RX550) during boot?

...digging...


and more here maybe : https://forum.proxmox.com/threads/gpu-passthrough-issues-after-upgrade-to-7-2.109051
 
Last edited:
Here's the complaint INSIDE the VM when I run ffmpeg, for example, which has amd support compiled in (The one that comes with jellyfin) - which then must be powered off (hangs on PCI when trying to shut it down)

Code:
 74.958805] BUG: kernel NULL pointer dereference, address: 00000000000000d8
[   74.960993] #PF: supervisor read access in kernel mode
[   74.963124] #PF: error_code(0x0000) - not-present page
[   74.965185] PGD 0 P4D 0
[   74.967217] Oops: 0000 [#1] SMP NOPTI
[   74.969211] CPU: 3 PID: 878 Comm: ffmpeg Tainted: G           OE     5.15.0-76-generic #83-Ubuntu
[   74.970755] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[   74.971586] RIP: 0010:amdgpu_device_baco_exit+0x8a/0xc0 [amdgpu]
[   74.972835] Code: 00 48 8b 80 80 00 00 00 48 85 c0 74 0d be 01 00 00 00 4c 89 ef e8 46 90 52 f6 f6 83 20 a1 01 00 08 74 1b 48 8b 83 00 92 00 00 <48> 8b 80 d8 00 00 00 48 85 c0 74 08 4c 89 ef e8 22 90 52 f6 5b 44
[   74.974627] RSP: 0018:ffff9a03c2593948 EFLAGS: 00010202
[   74.975511] RAX: 0000000000000000 RBX: ffff8e01a9a80010 RCX: 0000000000000000
[   74.976401] RDX: 0000000000000000 RSI: 00000000000005d0 RDI: ffff8e01a9a87fb8
[   74.977434] RBP: ffff9a03c2593968 R08: 0000000000000000 R09: 0000000000000000
[   74.978487] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   74.979593] R13: ffff8e01a9a80000 R14: 0000000000000000 R15: ffff8e0081f091b4
[   74.980572] FS:  00007f6fdee08b80(0000) GS:ffff8e00bed80000(0000) knlGS:0000000000000000
[   74.981485] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   74.982423] CR2: 00000000000000d8 CR3: 0000000124f0e000 CR4: 00000000000006e0
[   74.983341] Call Trace:
[   74.984239]  <TASK>
[   74.985122]  amdgpu_pmops_runtime_resume+0xe3/0x100 [amdgpu]
[   74.986320]  pci_pm_runtime_resume+0xb5/0xd0
[   74.987215]  ? pci_pm_freeze_noirq+0x110/0x110
[   74.988099]  __rpm_callback+0x4d/0x130
[   74.988981]  ? pci_pm_freeze_noirq+0x110/0x110
[   74.989860]  rpm_callback+0x67/0x70
[   74.990931]  ? pci_pm_freeze_noirq+0x110/0x110
[   74.991934]  rpm_resume+0x519/0x7c0
[   74.992821]  __pm_runtime_resume+0x52/0x90
[   74.993717]  amdgpu_driver_open_kms+0x58/0x250 [amdgpu]
[   74.994912]  drm_file_alloc+0x19e/0x270 [drm]
[   74.995860]  drm_open+0xd6/0x250 [drm]
[   74.996782]  drm_stub_open+0xba/0x140 [drm]
[   74.997705]  chrdev_open+0xf7/0x250
[   74.998613]  ? cdev_dynamic_release+0x90/0x90
[   74.999500]  do_dentry_open+0x16a/0x3f0
[   75.000377]  vfs_open+0x2d/0x40
[   75.001247]  do_open+0x20d/0x470
[   75.002268]  path_openat+0x112/0x2b0
[   75.003275]  do_filp_open+0xb2/0x160
[   75.004297]  ? __check_object_size+0x1d/0x30
[   75.005267]  do_sys_openat2+0x9f/0x160
[   75.006137]  __x64_sys_openat+0x55/0x90
[   75.006988]  do_syscall_64+0x5c/0xc0
[   75.007835]  ? do_user_addr_fault+0x1e7/0x670
[   75.008678]  ? exit_to_user_mode_prepare+0x37/0xb0
[   75.009516]  ? irqentry_exit_to_user_mode+0x9/0x20
[   75.010390]  ? irqentry_exit+0x1d/0x30
[   75.011228]  ? exc_page_fault+0x89/0x170
[   75.012046]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[   75.012855] RIP: 0033:0x7f6fe600f6eb
[   75.013660] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25
[   75.015422] RSP: 002b:00007ffc82adfe30 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[   75.016312] RAX: ffffffffffffffda RBX: 00005623f0164c40 RCX: 00007f6fe600f6eb
[   75.017183] RDX: 0000000000000002 RSI: 00007ffc82ae1cac RDI: 00000000ffffff9c
[   75.018035] RBP: 00007ffc82ae1cac R08: 00005623f0164c40 R09: 00005623f0164c20
[   75.019009] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[   75.019939] R13: 00007ffc82ae1cac R14: 0000000000000028 R15: 00007ffc82ae1cac
[   75.020729]  </TASK>
[   75.021501] Modules linked in: ceph libceph fscache netfs snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds snd soundcore serio_raw qemu_fw_cfg mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc sch_fq_codel scsi_dh_alua binfmt_misc ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ib_uverbs ib_core amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) bochs iommu_v2 drm_vram_helper amddrm_buddy(OE) drm_ttm_helper amd_sched(OE) amdkcl(OE) ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops hid_generic cec ahci psmouse libahci rc_core usbhid i2c_i801 lpc_ich virtio_net i2c_smbus net_failover virtio_scsi drm hid failover
[   75.027959] CR2: 00000000000000d8
[   75.029251] ---[ end trace a44baeea890d0453 ]---
[   75.030546] RIP: 0010:amdgpu_device_baco_exit+0x8a/0xc0 [amdgpu]
[   75.031997] Code: 00 48 8b 80 80 00 00 00 48 85 c0 74 0d be 01 00 00 00 4c 89 ef e8 46 90 52 f6 f6 83 20 a1 01 00 08 74 1b 48 8b 83 00 92 00 00 <48> 8b 80 d8 00 00 00 48 85 c0 74 08 4c 89 ef e8 22 90 52 f6 5b 44
[   75.034161] RSP: 0018:ffff9a03c2593948 EFLAGS: 00010202
[   75.035362] RAX: 0000000000000000 RBX: ffff8e01a9a80010 RCX: 0000000000000000
[   75.036697] RDX: 0000000000000000 RSI: 00000000000005d0 RDI: ffff8e01a9a87fb8
[   75.037819] RBP: ffff9a03c2593968 R08: 0000000000000000 R09: 0000000000000000
[   75.038992] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   75.040088] R13: ffff8e01a9a80000 R14: 0000000000000000 R15: ffff8e0081f091b4
[   75.041366] FS:  00007f6fdee08b80(0000) GS:ffff8e00bed80000(0000) knlGS:0000000000000000
[   75.042735] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   75.043992] CR2: 00000000000000d8 CR3: 0000000124f0e000 CR4: 00000000000006e0
 
RX550 probably does not reset properly (even if it claims FLR). You'll need to install vendor-reset and activate it for that GPU. There is more information about it on this forum.

EDIT: Do you see POLARIS12 reset messages when starting the VM? You probably did everything right but sometimes people forget to activate it for newer kernels.
 
Last edited:
Last edited:
Status Update:

It appears to be some sort of interaction between the kernel, KVM, Ubuntu, and the AMD drivers.

Pulled spare hardware
* installed fresh 7.4 pmx
* tried both 5.15 and 5.19 kernel
* installed both the rx560 & rx550 in same server, vendor-reset, etc
* Ubuntu 22 guest, AMD 5.5 drivers
* Swapped a WX4100 (same gen as RX550) - same result.

RX560 - fully functional, zero issues.
RX550 - AMD driver crashes, PCI bus hangs, non functional.
WX4100 - AMD driver crashes, PCI bus hangs, non functional

Code:
05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100] [1002:67e3]
05:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] [1002:aae0]


Jul 20 01:07:07 pmx0 kernel: [40047.711811] vfio-pci 0000:05:00.0: enabling device (0400 -> 0403)
Jul 20 01:07:07 pmx0 kernel: [40047.712160] vfio-pci 0000:05:00.0: AMD_POLARIS11: version 1.1
Jul 20 01:07:07 pmx0 kernel: [40047.712168] vfio-pci 0000:05:00.0: AMD_POLARIS11: performing pre-reset
Jul 20 01:07:07 pmx0 kernel: [40047.731613] vfio-pci 0000:05:00.0: AMD_POLARIS11: performing reset
Jul 20 01:07:07 pmx0 kernel: [40047.731625] vfio-pci 0000:05:00.0: AMD_POLARIS11: CLOCK_CNTL: 0x0, PC: 0x2880
Jul 20 01:07:07 pmx0 kernel: [40047.731630] vfio-pci 0000:05:00.0: AMD_POLARIS11: performing post-reset
Jul 20 01:07:07 pmx0 kernel: [40047.771522] vfio-pci 0000:05:00.0: AMD_POLARIS11: reset result = 0
Jul 20 01:07:07 pmx0 kernel: [40047.771759] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
Jul 20 01:07:07 pmx0 kernel: [40047.771778] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
Jul 20 01:07:07 pmx0 kernel: [40047.771792] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x1e@0x370
Jul 20 01:07:08 pmx0 kernel: [40047.995074] vfio-pci 0000:05:00.0: AMD_POLARIS11: version 1.1
Jul 20 01:07:08 pmx0 kernel: [40047.995088] vfio-pci 0000:05:00.0: AMD_POLARIS11: performing pre-reset
Jul 20 01:07:08 pmx0 kernel: [40047.995337] vfio-pci 0000:05:00.0: AMD_POLARIS11: performing reset
Jul 20 01:07:08 pmx0 kernel: [40047.995346] vfio-pci 0000:05:00.0: AMD_POLARIS11: CLOCK_CNTL: 0x0, PC: 0x2ac8
Jul 20 01:07:08 pmx0 kernel: [40047.995352] vfio-pci 0000:05:00.0: AMD_POLARIS11: performing post-reset
Jul 20 01:07:08 pmx0 kernel: [40048.034864] vfio-pci 0000:05:00.0: AMD_POLARIS11: reset result = 0
Jul 20 01:08:57 pmx0 kernel: [40157.549687] vfio-pci 0000:05:00.0: AMD_POLARIS11: version 1.1
Jul 20 01:08:57 pmx0 kernel: [40157.549701] vfio-pci 0000:05:00.0: AMD_POLARIS11: performing pre-reset
Jul 20 01:08:57 pmx0 kernel: [40157.549924] vfio-pci 0000:05:00.0: AMD_POLARIS11: performing reset
Jul 20 01:08:57 pmx0 kernel: [40157.549933] vfio-pci 0000:05:00.0: AMD_POLARIS11: CLOCK_CNTL: 0x0, PC: 0x2874
Jul 20 01:08:57 pmx0 kernel: [40157.549937] vfio-pci 0000:05:00.0: AMD_POLARIS11: performing post-reset
Jul 20 01:08:58 pmx0 kernel: [40157.587106] vfio-pci 0000:05:00.0: AMD_POLARIS11: reset result = 0

Two weird things:

1. i took an existing windows VM, passed the WX4100 through, drivers install correctly, card appears to work, but I didn't have a monitor to attach to the server to be sure. But I was able to adjust the fan speed on the card, change the output resolution, etc, and it didn't hang the windows VM.

2. if I use the card as the primary display, load AMD drivers in proxmox, it works fine w/o KVM/passthrough/etc.

So back to my original statement - something between the kernel, kvm, ubuntu, and the amd drivers is not functional with the rx550 cards.


Left to test:
1. pve8.x
2. 6.2 kernels
 
Last edited:
GOT IT!

Looking at the kernel messages for wx-4100/rx550/rx560 in Ubuntu guest, I only see one primary thing different.

550:
Code:
[   10.409863] amdgpu 0000:06:10.0: amdgpu: Using BACO for runtime pm

Maybe onto something?

* https://patchwork.kernel.org/projec...115165038.56646-16-alexander.deucher@amd.com/
* https://www.reddit.com/r/pop_os/comments/od64a3/how_do_i_add_amdgpurunpm0_in_the_boot_parameter/


so in /etc/default/grub ((( IN THE GUEST )) I added
Code:
GRUB_CMDLINE_LINUX_DEFAULT="amdgpu.runpm=0"

And voila! Card is happy, no lockups. RX550 happy, WX4100 happy

I don't know why amdgpu.runpm=1 (the default) completely hoses the PCI bus in KVM, but it does. (Maybe tracked/solved in KVM 8.x?)
 
GOT IT!

Looking at the kernel messages for wx-4100/rx550/rx560 in Ubuntu guest, I only see one primary thing different.

550:
Code:
[   10.409863] amdgpu 0000:06:10.0: amdgpu: Using BACO for runtime pm

Maybe onto something?

* https://patchwork.kernel.org/projec...115165038.56646-16-alexander.deucher@amd.com/
* https://www.reddit.com/r/pop_os/comments/od64a3/how_do_i_add_amdgpurunpm0_in_the_boot_parameter/


so in /etc/default/grub ((( IN THE GUEST )) I added
Code:
GRUB_CMDLINE_LINUX_DEFAULT="amdgpu.runpm=0"

And voila! Card is happy, no lockups. RX550 happy, WX4100 happy

I don't know why amdgpu.runpm=1 (the default) completely hoses the PCI bus in KVM, but it does. (Maybe tracked/solved in KVM 8.x?)
I created an account to just say THANK YOU! I thought I was going absolutely insane and was ready to trash the damn card. This totally solved my problem.

Thank you!
 
  • Like
Reactions: dlasher
I also want to say thank you so much. I have spend the last few days with no luck and this resolved the issue. I was a little confused on the Ubuntu guest but for future reference it is just the virtualized ubuntu instance, not the host.
 
Thank you, dlasher. I'm not using proxmox but had the same problem with an RX550 on an AlmaLinux host / Debian guest setup.

 
I created an account to just say THANK YOU! I thought I was going absolutely insane and was ready to trash the damn card. This totally solved my problem.

Thank you!


Hello!

I'm a bit of a beginner in proxmox. I would like to ask for help.
My configuration is:
OptiPlex 3000 Tower
The machine is running proxmox 8.0.3
IF:
pfsense
debian 12
Macos Ventura
Windows 11

I included it, that is, I would include 1 Sapphire pulse rx550 video card.

I already inserted the card in the machine as a test, but it messed with pfsense in such a way that it started assigning completely different IPs to my network devices (without internet).

I would like to ask, in the case of the solution you have discovered, does the given line have to be set in every VM?
Or is it only legendary in pve?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!