Proxmox 7.3 Kernel 6.1 RX480 Error 43

djx

Member
Aug 11, 2020
10
2
8
36
I was previously using Proxmox 6.1 and passing through my RX480 to a windows guest. It was working smoothly, except for the issue of unexpected guest shutdowns making the GPU unusable until the system did a full power cycle.

I updated to Proxmox 7.3 and the windows guest stopped working. First it was UEFI issues so I did a fresh install, and then I noticed the GPU stopped passing through. After lots of reading in this forum, I found that the previous hacks are no longer recommended. I removed pretty much all of the kernel options from grub, disabled the hard-coding of PCI addresses in the vfio config, and installed vendor-reset. Still no luck.

System Specs:
Host OS: ProxMox 7.3
Guest OS: Windows 10 LTSC
Motherboard: Asus ROG X570 Tuf-Gaming - Plus with Wifi
CPU: Ryzen 5950X
GPU: (2) RX 480, (1) RX580

Grub command:
GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1GB hugepages=1 iommu=pt pci=noaer initcall_blacklist=sysfb_init"

vfio.config:
options kvm ignore_msrs=1 softdep amdgpu pre: vfio vfio_pci

After lots of tweaking, here's where I am:

  • Using Kernel 6.1 with vendor-reset
    • No modules blacklisted
  • startup script successfully setting devices reset_method to device_specific for each GPU
  • /proc/iomem shows the memory ranges successfully passed over to vfio-pci
  • lshw showing devices using driver=vfio-pci after the VM boots up
  • If I use the non-primary GPU, the Windows 10 guest can it, but it shows error 43.
    • If I disable / re-enable the card it shows as "working properly", but does not detect the dummy display (HDMI plug) that I have in the card. It also doesn't show up under the task manager as a graphics card
    • Gpu-Z sees the card, and can even read the temperatures and other stats
    • Tried installing the 22.11.2 and 22.5.1 Adrenalin drivers
  • Upon booting my host, I see this error: [drm:detect_link_and_local_sink [amdgpu]] *ERROR* No EDID read.
  • My linux guest (Emby) uses my passed through video card for transcoding without issue
  • If I use the non-primary GPU, the Windows 10 guest can it, but it shows error 43.
    • If I disable / re-enable the card it shows as "working properly", but does not detect the dummy display (HDMI plug) that I have in the card. It also doesn't show up under the task manager as a graphics card
    • Gpu-Z sees the card, and can even read the temperatures and other stats
    • Tried installing the 22.11.2 and 22.5.1 Adrenalin drivers
    • When I reboot the guest the vendor-reset does its thing:
      Code:
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: enabling device (0400 -> 0403)
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: version 1.1
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: performing pre-reset
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: performing reset
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x2a34
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: performing post-reset
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: reset result = 0
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: vfio_ecap_init: hiding ecap 0x1e@0x370
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: version 1.1
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: performing pre-reset
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: performing reset
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x2b4c
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: performing post-reset
      [Sun Jan  8 18:13:54 2023] vfio-pci 0000:0c:00.0: AMD_POLARIS10: reset result = 0
  • If I try to boot my guest VM with the primary GPU (The one proxmox initially displays to) I get this long warning, and the guest fails to start: (see attached warning log)


It seems like it's very close to working. The card shows up, reboots fine, and Windows can inspect the hardware - it just doesn't use it for rendering or detect any displays on it.

Any help to get this thing finished would be greatly appreciated!
 

Attachments

  • warn_log.txt
    13.4 KB · Views: 2
Does buying a subscription increase the odds of getting help with this? :)
 
I just changed the host from q35-7.1 to q35-7.0 and it suddenly started working. I will play with some more settings later and report what configurations work in case it helps someone else in the future.
 
  • Like
Reactions: leesteken
So with all my testing I have confirmed that both the old method and the new method work, as long as you don't use q35-7.1 .

  • Old Method - Blacklist drivers, assign PCI IDs to vfio, add a bunch of kernel options, cross your fingers. This is what most blog posts and even the ProxMox wiki show.
  • New Method - Let the driver do its thing, the newer kernel should hand off the device to vfio successfully. Use vendor-reset to reset to manage troublesome cards like the RX480 that don't manage power properly.
    • I created a shell script that runs on boot (before machines start) to set the device settings
In my case, I intend to pass through all video cards to guests, which has some conflicts when the host OS touches that card during the boot process. The best result I found was to use nomodeset in the kernel options.

The only issue left over is that the dummy HDMI plugs on the primary card (the one the host uses to boot) does not get recognized properly. There are errors about EDID not being read. I tried a few fixes, like amdgpu.dc=0 to prevent the displays from being touched by the driver. Apparently there's a bug with this that causes page faults with vfio. I also tried using drm.edid_firmware=edid/1280x1024.bin to force the EDID mode of the devices to be something correct, but the primary card still won't pass through the dummy plug.

My current solution is to just use a virtual display on the guest with the primary GPU and have the windows guest add it upon boot. It's working smoothly now.

While I would love to figure out why the HDMI isn't working 100% on the primary card, I've already spent lots of time on this and am satisfied with the way it works now.

My settings:
vfio.config
Code:
options kvm ignore_msrs=1
softdep amdgpu pre: vfio vfio_pci

blacklist
none!

grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1GB hugepages=1 iommu=pt pci=noaer nomodeset initcall_blacklist=sysfb_init"

modules
Code:
# for fixing GPUs
vendor-reset

# vfio
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

GPU start script (stolen from another forum)
Code:
#!/bin/bash
echo device_specific > /sys/bus/pci/devices/0000:05:00.0/reset_method
echo device_specific > /sys/bus/pci/devices/0000:06:00.0/reset_method
echo device_specific > /sys/bus/pci/devices/0000:0c:00.0/reset_method
 
This is exactly what's happening to me and I think I am at my wits end.

Proxmox v7.3
Ryzen 5900x
ASRock X570 Taichi
Quadro K4200

The VM boots and sees the GPU. There is direct video output of the Windows VM through the GPU on a monitor, however, as soon as you go into the device manager, the dreaded Code 43 error appears. GPU-Z sees the specs of the GPU, disabling and re-enabling the GPU stops the error but doesn't show the GPU in the task manager. Rebooting brings the Code 43 error back. I have tried all of the settings you posted above, but unfortunately, no luck.

Part of me thinks it's driver related due to the fact that I have video output, but I've tried tons of different driver versions with no luck. I am getting ready to give up, but it's frustrating as I feel like I'm 99% of the way there but can't get over the finish line.

Anyone have any suggestions? Really appreciate any input - thank you!
 
This is exactly what's happening to me and I think I am at my wits end.

Proxmox v7.3
Ryzen 5900x
ASRock X570 Taichi
Quadro K4200

The VM boots and sees the GPU. There is direct video output of the Windows VM through the GPU on a monitor, however, as soon as you go into the device manager, the dreaded Code 43 error appears. GPU-Z sees the specs of the GPU, disabling and re-enabling the GPU stops the error but doesn't show the GPU in the task manager. Rebooting brings the Code 43 error back. I have tried all of the settings you posted above, but unfortunately, no luck.

Part of me thinks it's driver related due to the fact that I have video output, but I've tried tons of different driver versions with no luck. I am getting ready to give up, but it's frustrating as I feel like I'm 99% of the way there but can't get over the finish line.

Anyone have any suggestions? Really appreciate any input - thank you!
I suggest making a new thread and then posting the link here. I'll try to take a look at your setup. I suggest showing the full details of your current configuration, including:
  1. grub command
  2. current kernel
  3. vfio module options
  4. any blacklisted modules
  5. dmesg output from boot onward, and tag where the guest boot happens
  6. VM guest PCI configuration
I'll do my best to respond and see if I can help.

Unfortunately, I didn't get help here, but the VFIO thread on reddit was helpful.
 
I suggest making a new thread and then posting the link here. I'll try to take a look at your setup. I suggest showing the full details of your current configuration, including:
  1. grub command
  2. current kernel
  3. vfio module options
  4. any blacklisted modules
  5. dmesg output from boot onward, and tag where the guest boot happens
  6. VM guest PCI configuration
I'll do my best to respond and see if I can help.

Unfortunately, I didn't get help here, but the VFIO thread on reddit was helpful.
I will definitely do that tomorrow morning with a clear head (spent way too much time on this today) - thank you!!!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!