[SOLVED] GPU passthrough last step issue, AMD driver doesn't work

a9udn9u

New Member
Nov 9, 2022
17
3
3
GPU is Sapphire Pulse RX 6700 XT

I followed this guide loosely, everything worked till the last step. The GPU is detected by Windows, AMD driver installation appears to be successful, no errors whatsoever but for some reason the GPU still has that small exclamation mark in device manager. I can confirm graphic acceleration is not working because I only get single digit FPS in this WebGL test (not the best benchmark I know : ) ).

A few things I've tried:
  • Connect the card to a monitor because some posts say it can't be unplugged.
  • Enable GPU rendering over RDP in group edit. (I'm connecting to the VM via RDP)
  • Different driver installations, driver only, minimal, full install
Steps I took, nothing unusual:
  1. Grub command line, added: intel_iommu=on iommu=pt initcall_blacklist=sysfb_init
  2. Kernel modules loaded: vfio, vfio_iommu_type1, vfio_pci, vfio_virqfd
  3. Kernel module options:
    1. vfio_iommu_type1 allow_unsafe_interrupts=1
    2. vfio_pci disable_vga=1
    3. vfio_pci enable_sriov=1
    4. vfio_pci ids=8086:4680,1002:73df,1002:ab28
    5. kvm ignore_msrs=1
    6. kvm report_ignored_msrs=0
  4. Blacklisted video drivers: amdgpu, radeon
  5. Enabled SRIOV in BIOS (tried iGPU passthrough but failed, Alder Lake CPU).

Code:
# qm config 3000
balloon: 4096
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
efidisk0: local-zfs:vm-3000-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:03:00,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: pc-q35-7.0
memory: 16384
meta: creation-qemu=7.0.0,ctime=1668754724
name: windows
net0: virtio=AA:97:D8:C6:1C:6A,bridge=vmbr0,firewall=1
numa: 1
ostype: win11
scsi0: local-zfs:vm-3000-disk-1,backup=0,cache=writethrough,discard=on,size=60G
scsihw: virtio-scsi-pci
smbios1: uuid=4c9ee8ce-cb9f-479c-8386-07cd616bf476
sockets: 1
tpmstate0: local-zfs:vm-3000-disk-2,size=4M,version=v2.0
vga: none
vmgenid: a7b35231-f456-4539-9e04-441649f01f7a

Screenshot 2022-11-18 at 9.10.24 AM.png

I feel it must be something minor but I'm out of clues... Any ideas?
 
Last edited:
I see two issues with your VM configuration: You enabled ballooning but that does not work in combination with PCI(e) passthrough. Don't enable Primary GPU (x-vga=1) for AMD GPUs. This is often necessary for NVidia GPUs but AMD GPUs usually work better without it.

Can you try booting that VM with the latest Ubuntu installer (no need to install Ubuntu, just select Try Ubuntu) to see if the GPU works in passthrough? Some 6700 GPUs don't reset properly and there is no work-around yet as far as I know. If you see the Ubuntu desktop on a display connected to the GPU, then it's probably a Windows driver issue.
 
  • Like
Reactions: a9udn9u
Thanks for the reply!

Disable memory ballooning, uncheck primary GPU didn't change anything unfortunately.

I actually tried Ubuntu first, similar symptom, lspci detects the GPU without issue, I know the open source radeon driver is installed by default so didn't bother getting the AMD driver, but I'm not sure how to verify if it works, xRDP doesn't seem to support hardware acceleration.

Is it possible to force the host to use the iGPU and passthrough the dGPU to a VM to workaround the reset bug? I did some quick search but didn't find anything, except this GitHub project but it doesn't support 6000 series cards. I tried setting default display to iGPU in BIOS, but it didn't fix the issue with Windows. Ubuntu I still don't know how to verify.
 
Last edited:
I actually tried Ubuntu first, similar symptom, lspci detects the GPU without issue, I know the open source radeon driver is installed by default so didn't bother getting the AMD driver, but I'm not sure how to verify if it works, xRDP doesn't seem to support hardware acceleration.
Do you see output on a physical display connected to the GPU when running Ubuntu? Then it works.
Is it possible to force the host to use the iGPU and passthrough the dGPU to a VM to workaround the reset bug?
Passthrough is always easier if Proxmox (and booting the host) does not use the GPU you want to passthrough. I don't know how to verify that it has a reset issue or that it is something else. Try dumping the ROM to a file when running Proxmox from the integrated graphics (make sure nothing touches the GPU) and use that with the ROM File option.
I did some quick search but didn't find anything, except this GitHub project but it doesn't support 6000 series cards.
vendor-reset is indeed very useful for older AMD GPUs that all have reset issues (and the work-arounds are too big/involved for inside the Linux kernel). The 6000-series is supposed to not have the function level reset issue but some models from some vendors do have reset issues. What is the output of cat /sys/bus/pci/devices/0000:03:00.0/reset_method? Maybe it supports bus also and you could try that instead of flr.
 
  • Like
Reactions: a9udn9u
Do you see output on a physical display connected to the GPU when running Ubuntu? Then it works.
I can boot into Ubuntu desktop this way but the resolution was 800x600, I don't have USB passthrough yet so don't have a keyboard or mouse to do anything this way, I can try later.

There's a slight chance that this is a driver issue, I tried to install driver from AMD's Ubuntu repo, but the dkms-amdgpu package installation failed due to compilation error, seems it doesn't like the 5.19 kernel. I also found my amdgpu and radeon kernel module were blacklisted, I guess it's because of the broken AMD driver installation.

Passthrough is always easier if Proxmox (and booting the host) does not use the GPU you want to passthrough. I don't know how to verify that it has a reset issue or that it is something else. Try dumping the ROM to a file when running Proxmox from the integrated graphics (make sure nothing touches the GPU) and use that with the ROM File option.
Not sure what this means but I'll do my search.

vendor-reset is indeed very useful for older AMD GPUs that all have reset issues (and the work-arounds are too big/involved for inside the Linux kernel). The 6000-series is supposed to not have the function level reset issue but some models from some vendors do have reset issues. What is the output of cat /sys/bus/pci/devices/0000:03:00.0/reset_method? Maybe it supports bus also and you could try that instead of flr.
That file gives me bus. Again I'm not sure how to switch from flr but I will try to search for information about it.

Thank you @leesteken for all the suggestions!
 
  • Like
Reactions: leesteken
I can boot into Ubuntu desktop this way but the resolution was 800x600, I don't have USB passthrough yet so don't have a keyboard or mouse to do anything this way, I can try later.
Still, it looks like the GPU is working in passthrough (at least once, after a reboot of the Proxmox host).
There's a slight chance that this is a driver issue, I tried to install driver from AMD's Ubuntu repo, but the dkms-amdgpu package installation failed due to compilation error, seems it doesn't like the 5.19 kernel. I also found my amdgpu and radeon kernel module were blacklisted, I guess it's because of the broken AMD driver installation.
I don't see a reason to install AMD's drivers. The open source drivers that come with (an installed and updated) Ubuntu work fine and are very well supported.
Not sure what this means but I'll do my search.
It sometimes helps to provide the GPU with its original ROM (firmware/GPU BIOS) after a reset, to more closely mimic a fresh start. See the (old) Proxmox WIki for more information.
That file gives me bus. Again I'm not sure how to switch from flr but I will try to search for information about it.
If bus is the only choice (after a reboot of the host and not starting a VM with the GPU), that means that flr is not even supported. Looks like this particular GPU does not reset properly. The romfile= might help especially when the GPU is reset by disconnecting/reconnecting it to the PCIe bus.
Thank you @leesteken for all the suggestions!
Thank you for putting the work in. If you have usable output on a physical display, then passthrough is working. Which makes me think that it's a proprietary AMD driver issue on Windows, and I don't know how to fix that.
 
  • Like
Reactions: a9udn9u
If you have usable output on a physical display, then passthrough is working. Which makes me think that it's a proprietary AMD driver issue on Windows, and I don't know how to fix that.
I can now confirm 100% passthrough works with a monitor attached to it, at least in Ubuntu. I just deleted the broken driver from AMD repo, fallback to the default amdgpu driver and everything seems to work, mesa-util shows direct rendering, getting 60fps in that WebGL test linked in the first post. I also powered off the VM twice to make sure GPU can wake between reboots, and it worked!

Thank you @leesteken!!! Still don't know how to get it to work in Windows, nor how to enable GPU acceleration in linux RDP but at least I know those are software issues.

BTW, when I check /sys/bus/pci/devices/0000:03:00.0/reset_method the first time, the Ubuntu VM was already booted, I re-checked it after rebooting the host, "bus" is indeed the only choice. Is it an indication of any problem? I didn't do ROM file as you suggested.
 
Last edited:
I can now confirm 100% passthrough works with a monitor attached to it, at least in Ubuntu. I just deleted the broken driver from AMD repo, fallback to the default amdgpu driver and everything seems to work, mesa-util shows direct rendering, getting 60fps in that WebGL test linked in the first post. I also powered off the VM twice to make sure GPU can wake between reboots, and it worked!
If stopping the VM and starting it again works flawlessly, then I don't think that GPU has a reset issue.
Thank you @leesteken!!! Still don't know how to get it to work in Windows, nor how to enable GPU acceleration in linux RDP but at least I know those are software issues.

BTW, when I check /sys/bus/pci/devices/0000:03:00.0/reset_method the first time, the Ubuntu VM was already booted, I re-checked it after rebooting the host, "bus" is indeed the only choice. Is it an indication of any problem? I didn't do ROM file as you suggested.
I would have expected flr as well but i have no experience with the 6000-series myself. Maybe the reset is not perfect and the Linux drivers can handle that but the Windows drivers not? The romfile option might help in making the reset more like a actual hardware reboot. I don't have any Windows passthrough experience that can help here, sorry.
 
I don't have any Windows passthrough experience that can help here, sorry.
You've been extremely helpful! I can live with Ubuntu for now, I don't game that much and when I do I mostly play retro games with emulators so Linux is perfectly fine!
 
Just got Windows to work as well, it turned out to be the "4g decoding" BIOS setting, mine was "enabled", once change that to "disabled", everything worked flawlessly. I also disabled Re-size Bar.

For future reference, in summary:
1) deleted the AMD 1st party driver, use the open source amdgpu driver shipped with Ubuntu, and
2) disable "4g decoding" in BIOS
.

Thank you @leesteken for the help!
 
Last edited:
  • Like
Reactions: leesteken
a9udn9u, I'm having the same problem. I did disable above 4g decoding in my BIOS. You say that you disabled Re-size Bar. Is that a reference to ROM-Bar checkmark that I see in the that I see in the edit for the passed-through PCI device? I've tried that, but I'm not sure if that's what you're referring to.
Most critically, I don't understand the use of the amdgpu open source driver. Did you install that in Proxmox rather directly in Windows? If you installed it in Windows, can you say more about the process for doing that?
 
a9udn9u, I'm having the same problem. I did disable above 4g decoding in my BIOS. You say that you disabled Re-size Bar. Is that a reference to ROM-Bar checkmark that I see in the that I see in the edit for the passed-through PCI device? I've tried that, but I'm not sure if that's what you're referring to.
Most critically, I don't understand the use of the amdgpu open source driver. Did you install that in Proxmox rather directly in Windows? If you installed it in Windows, can you say more about the process for doing that?
Resize bar is a BIOS setting, similar to "4g decoding", it's not a VM setting, nor was it critical in my case since I don't have a nVidia GPU.
`amdgpu` is an open source Linux driver for AMD GPUs, you install it only if you want to pass GPU to a Linux VM, you can't install it on Windows since it's a Linux driver.
 
You indicated above that you "just got Windows to work as well". Was the 4g decoding setting in BIOS the key? I've done that, but my Device Manager screen still looks the same as yours towards the top of this thread. Any other suggestions, given your experience with this? I've done everything else in the steps you took (except the two "kvm" lines in the kernel module options) and added some from the Proxmox Admin Guide. I'll try the "kvm" lines.
 
You indicated above that you "just got Windows to work as well". Was the 4g decoding setting in BIOS the key?
Yes, that's what did it for me.

Any other suggestions, given your experience with this?
Any chance you copied my vfio device IDs? You need to get your own IDs instead.

Other than that I can't think of any, I've posted all the changes I did to make it work, but if your hardware is different, you might need to follow different guides. My setup is 12700K + TUF GAMING Z690 D4 + Sapphire Pulse 6700XT
 
No, I used my own device IDs and the "lspci nnk" command shows "Kernel driver in use: vfio-pci". I did notice that you had three IDs. I have two, 1002:743f and 1002:ab28, which are close to yours. I also have two devices identified as AMD but are PCI bridges, 1002:1478 and 1002:1479. I've tried adding them in the vfio-pci options, but that doesn't change their drivers, so I've taken them back out.

Your solution in BIOS prompted me to change my BIOS primary display setting. The choices are embedded, PCI slot and auto. I'd set that to embedded, but changed it to auto, to see what would happen. Same result.

I used to see the Windows machine through the video card but using the Windows driver (and seeing only a portion of the screen. I've changed a lot of settings and don't see that anymore. But I'm not sure what setting changed that.

I have an Intel i7-13700K on an ASRock motherboard. That is reasonably close to your setup, I think, so I was optimistic. Too soon. I'm thinking of maybe using the embedded graphics for the virtual Windows machine and the AMD 6400 video card as the primary graphics for Proxmox to see if I can get the AMD driver to work with Proxmox, but I haven't quite given up.

By-the-way, I did add your kvm lines (5 and 6) to options, even though I don't know what they do, but that didn't change anything.
 
a9udn9u, I thought you'd be mildly interested in this update, as might others who stumble across this thread.
I did disable "above 4G decoding" in my BIOS. (The motherboard is ASRock Z790 Taichi.) After a couple of more frustrating days, I discovered that the disable wasn't holding. After the BIOS change, the motherboard would reset above 4G decoding to enabled, and the driver continued to be blocked by Windows, substituting the MS generic driver. That's how things stood when I shut the server down on May 6th.

After I turned it back on May 7th, I checked the driver status again, and the AMD driver was working (AMD 6400). I had not made any changes. The above 4G decoding is still set to enabled. I do have the Windows VM set for automatic updates, including drivers. The update history shows no AMD driver download at that point. The only change I could find was an update to Windows Defender virus protection.

Go figure. Anyway the driver is now working, so I'm now done fiddling with this.
 
I discovered that the disable wasn't holding.
I had the same issue with my ASUS motherboard, I should've mentioned it earlier but you said yours is ASRock, I figured that the chance is extremely low...
For me, if the option is disabled and saved, it stays disabled. But if I enter BIOS again and change any other options, the 4g decoding option would be re-enabled. The solution for me is that every time I change anything in BIOS, I had to re-disable 4g decoding and save it with other changes.
It would be really interesting if ASRock motherboards have the exact same issue.
Anyway, glad to hear that you get it sorted out.
 
  • Like
Reactions: leesteken
One final note for anyone who has the same problem. I have an ASRock Z790 Taichi motherboard and an AMD 6400 based video card. I am still unable to disable "above 4G decoding". However, I've discovered that that doesn't matter. The key to making the AMD Radeon RX 6400 work in Windows 11 is to disable C.A.M. (Clever Access Memory) in BIOS.
 
I have reset service activated, but in dmesg there are several strange lines related to igpu and reset:

07:00.0 - iGPU
07:00.1 - audio device for HDMI
03:00.0 - sata adapter in M2 slot

in dmesg:
[ 1.313344] vfio-pci 0000:07:00.0: vgaarb: deactivate vga console
[ 1.313351] vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:eek:wns=none
[ 1.313508] vfio_pci: add [1002:164e[ffffffff:ffffffff]] class 0x000000/00000000
[ 1.336489] vfio_pci: add [1002:1640[ffffffff:ffffffff]] class 0x000000/00000000
[ 1.336558] vfio_pci: add [1b21:1166[ffffffff:ffffffff]] class 0x000000/00000000
[ 1.336563] vfio_pci: add [2116:2116[ffffffff:ffffffff]] class 0x000000/00000000
[ 15.108199] vfio-pci 0000:07:00.0: enabling device (0002 -> 0003)
[ 15.122817] vfio-pci 0000:07:00.1: enabling device (0000 -> 0002)
[ 15.175001] vfio-pci 0000:03:00.0: enabling device (0000 -> 0002)
[ 19.901322] vfio-pci 0000:07:00.0: Unsupported reset method 'device_specific'

the reset_method are:
/sys/bus/pci/devices/0000:07:00.0/reset_method
bus
/sys/bus/pci/devices/0000:07:00.1/reset_method
pm bus
/sys/bus/pci/devices/0000:03:00.0/reset_method
pm bus

passthrough of all 3 devices are working in vm. but igpu often are not reseting after stop of vm. in windows 10 host igpu return error 43 on next start of vm without host reboot. all other devices detach and reattach to vm without problems. after host reboot everything is start working again.

Why only iGPU have "bus", then others have "pm bus". How to debug reseting of device? May be some message that reset at least was attempted.
 
in syslog errors related to reset:
Mar 06 23:43:20 pve systemd[1]: Started vreset.service - AMD GPU reset method to 'device_specific'.
Mar 06 23:43:20 pve bash[1219]: /usr/bin/bash: line 1: echo: write error: Invalid argument
Mar 06 23:43:20 pve systemd[1]: vreset.service: Main process exited, code=exited, status=1/FAILURE
Mar 06 23:43:20 pve systemd[1]: vreset.service: Failed with result 'exit-code'.
Mar 06 23:43:20 pve kernel: vfio-pci 0000:07:00.0: Unsupported reset method 'device_specific'
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!