GPU Passthrough Ryzen 4600G APU

DracoTomes

New Member
Dec 28, 2022
5
1
3
Hi all,

like a couple of others I have been trying to get GPU passthrough working with a Ryzen APU but now I'm stuck.

First things first - I have no idea what I'm doing. I'm fueled by the naivety of the young and 500 open browser tabs so please go easy on me.

I think I have the basic passthrough working but I'm having trouble getting the amdgpu driver to load on the guest system.

My host is:
Ryzen 5 4600G
Asus ROG B550-A Gaming
32GB Samsung M391A4G43MB1-CTDQ

This is my vm.conf file:

Code:
agent: 1
bios: ovmf
boot: order=scsi0;net0
cores: 6
efidisk0: local:102/vm-102-disk-1.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:0b:00.0,pcie=1
machine: q35
memory: 6144
meta: creation-qemu=6.1.1,ctime=1668423230
name: ornn
net0: virtio=7E:16:4E:A6:67:30,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
parent: install
scsi0: local:102/vm-102-disk-0.qcow2,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=b28abe58-882e-475e-ae88-1460a426df3d
sockets: 1
vmgenid: 4c6b53bd-e771-4011-8997-6a86a3911754

lspci -nnk on the host looks like this:
Code:
0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c9)
        Subsystem: ASUSTeK Computer Inc. Renoir [1043:87e1]
        Kernel driver in use: vfio-pci
        Kernel modules: amdgpu

So the host ignores the driver successfully.

lspci -nnk on the guest looks like this:
Code:
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c9)
        Subsystem: ASUSTeK Computer Inc. Renoir [1043:87e1]
        Kernel modules: amdgpu

So the passthrough looks good to me, except the driver is not getting used.

This is an error I got before enabling pcie on the device I'm leaving here in case anyone else is stuck here:
Code:
[    2.423230] amdgpu 0000:01:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    2.426412] amdgpu 0000:01:00.0: amdgpu: Unable to locate a BIOS ROM
[    2.426417] amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
[    2.426420] amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
[    2.427239] amdgpu: probe of 0000:01:00.0 failed with error -22

I also had to disable secureboot on the host as well as on the guest to get this far.

Now I get the following output for sudo dmesg | grep amdgpu
Code:
[    3.163064] [drm] amdgpu kernel modesetting enabled.
[    3.163086] [drm] amdgpu version: 5.18.13
[    3.163931] amdgpu: CRAT table not found
[    3.163961] amdgpu: Virtual CRAT table created for CPU
[    3.163984] amdgpu: Topology: Add CPU node
[    3.184178] amdgpu: PeerDirect support was initialized successfully
[    3.228494] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    3.229023] amdgpu: ATOM BIOS: 113-RENOIR-034
[    3.230660] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[    3.231094] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
[    3.231516] amdgpu 0000:01:00.0: amdgpu: MODE2 reset
[    7.514970] amdgpu 0000:01:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000002
[    7.515358] amdgpu 0000:01:00.0: amdgpu: Mode2 reset failed!
[    7.515652] amdgpu 0000:01:00.0: amdgpu: asic reset on init failed
[    7.515908] amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
[    7.516396] amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
[    7.517976] amdgpu: probe of 0000:01:00.0 failed with error -62

From what I've gathered the MODE2 thing seems to be an APU thing. But I've seen people come further than this so at this point I'm not sure whether to continue troubleshooting the host or the guest.
In this thread they seem to get past the MODE2 reset fine, and the amdgpu driver gets shown as in use: https://gitlab.freedesktop.org/drm/amd/-/issues/2046
I've also seen the driver getting loaded for some other people, but there is conflicting as to which kernel options are needed on the host such as nomodset, whether amd_iommu and other options are needed at all and so on. For completion sake here are my settings on the host:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on pcie_acs_override=downstream,multifunction video=efifb:off"

My guest is running Ubuntu 22.10 with the 5.19.0-26 kernel. I've also tried 22.04 with the 5.15 kernel before without success.
I've also tried the default amdgpu driver and the one installed using the amdgpu-install script from the AMD website.

Now I'm kind of stuck as to where to troubleshoot... The device seems to pass through ok so I'm suspecting it's an issue with the guest but also some people seemed to get farther than me so it could still be a setting on the host.

Any advice is appreciated.
 
Ok I figured it out, it was a bootloader option. I was missing initcall_blacklist=sysfb_init on the host.
 
Can you write step by step what you do get passthrough? On Intel iGPU no problem at all.
 
I mainly followed the official guides for PCI(e) and PCI.

Edit /etc/default/grub and add the following options to the GRUB_CMDLINE_LINUX_DEFAULT line:
amd_iommu=on initcall_blacklist=sysfb_init video=efifb:off

Run update-grub afterwards

Add the following to /etc/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Optionally install vendor-reset as described here.
After that update your initramfs with update-initramfs -u -k all

Run lspci -nn |grep VGA, you should receive an output like this:
0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c9)
Take note of the IOMMU group of the GPU in this case 0b:00 although this might differ on your machine.
Run lspci -nn | grep 0b:00 with your IOMMU group and you should get an output like this:
Code:
0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c9)
0b:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
0b:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
0b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
0b:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
0b:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]

Write down the Vendor IDs in the braces at the end like [1022:1639]

Create a file in /etc/modprobe.d/ called for example blacklist-amd.conf and add the following to it, replacing your IDs:
Code:
blacklist radeon
blacklist amdgpu
options vfio-pci ids=1002:1636,1002:1637,1022:15df,1022:1639,1022:15e3

After this reboot your host, enter the BIOS and disable secure boot. This will depend on your motherboard BIOS. For my ASUS board I followed this tutorial.

After this run lspci -nnk | grep -A 2 0b:00.0, replacing your IOMMU group and check that it says Kernel driver in use: vfio-pci. If it says Kernel driver in use: amdgpu the host is still initializing the GPU and it will not pass through.

If everything looks right you can create a VM. Machine type needs to be q35 and BIOS OMVF.
Pass through your PCI device either using the GUI or the console command qm set VMID -hostpci0 IOMMU,pcie=1, make sure you have PCI-Express enabled. I only pass through the 0b:00.0 device, if you need the other functions as well pass through the whole device.

I think I also had to disable secure boot in the VM deleting the Platform Keys in the VM BIOS which you can enter using F2 during boot.

Install your guest OS.
On the guest run lspci -nnk | grep -A 2 VGA, you should see your GPUs like:
Code:
00:01.0 VGA compatible controller [0300]: Device [1234:1111] (rev 02)
        Subsystem: Red Hat, Inc. Device [1af4:1100]
        Kernel driver in use: bochs-drm
--
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c9)
        Subsystem: ASUSTeK Computer Inc. Renoir [1043:87e1]
        Kernel driver in use: amdgpu
If it says Kernel driver in use: amdgpu you should have a working GPU in the VM. If it only says Kernel modules: amdgpu the driver is not loading correctly. Check the output of sudo dmesg | grep amdgpu to see why it fails.

Your Guest should have at least Kernel 5.11.32.21.40 for the amdgpu module to be included. My guest is on Ubuntu 22.10 with the 5.19.0-26 Kernel but earlier should work too.

You can also check whether everything works successfully by running ls /dev/dri which should show card1 and renderD128

I don't remember anything else you should need to do.
 
Last edited:
  • Like
Reactions: WST
Thank you for your tenacity on this.

I'm just about through this rabbit hole, but I'm stuck on one part. Everything looks good on the host, it shows Kernel driver in use: vfio-pci. But as soon as I start up my VM with the PCI device attached, it adds Kernel modules: amdgpu to the output of lspci -nnk | grep -A 2 04:00.0 and the GPU doesn't show up in my guest.

I did try to do the optional vendor-reset install, but no avail. My guest is also 22.10, kernel 5.19.0-31.

Any input would be appreciated, thanks.
 
Last edited:
Thank you for your tenacity on this.

I'm just about through this rabbit hole, but I'm stuck on one part. Everything looks good on the host, it shows Kernel driver in use: vfio-pci. But as soon as I start up my VM with the PCI device attached, it adds Kernel modules: amdgpu to the output of lspci -nnk | grep -A 2 04:00.0 and the GPU doesn't show up in my guest.

I did try to do the optional vendor-reset install, but no avail. My guest is also 22.10, kernel 5.19.0-31.

Any input would be appreciated, thanks.
Hm, haven't had that one before.
I think the output of sudo dmesg | grep amdgpu is where you want to look. Maybe you can get information on what initializes the driver.
In theory the driver should not be loaded because of the blacklist in /etc/modprobe.d/.

Did you update you initramfs again?
I noticed they mention it in the PCI(e) Wiki after the modprobe configuration but I didn't specifiy it in my post.
For both methods you need to update the initramfs again and reboot after that.
 
Really strange. I thought maybe my Proxmox kernel needed to be updated, tried 5.19 and same thing. Kernel modules: amdgpu just won't go away now, even with it in modprobe.

I did try updating initramfs, same thing.
 
I mainly followed the official guides for PCI(e) and PCI.

Edit /etc/default/grub and add the following options to the GRUB_CMDLINE_LINUX_DEFAULT line:
amd_iommu=on initcall_blacklist=sysfb_init video=efifb:off

Run update-grub afterwards

Add the following to /etc/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Optionally install vendor-reset as described here.
After that update your initramfs with update-initramfs -u -k all

Run lspci -nn |grep VGA, you should receive an output like this:
0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c9)
Take note of the IOMMU group of the GPU in this case 0b:00 although this might differ on your machine.
Run lspci -nn | grep 0b:00 with your IOMMU group and you should get an output like this:
Code:
0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c9)
0b:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
0b:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
0b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
0b:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
0b:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]

Write down the Vendor IDs in the braces at the end like [1022:1639]

Create a file in /etc/modprobe.d/ called for example blacklist-amd.conf and add the following to it, replacing your IDs:
Code:
blacklist radeon
blacklist amdgpu
options vfio-pci ids=1002:1636,1002:1637,1022:15df,1022:1639,1022:15e3

After this reboot your host, enter the BIOS and disable secure boot. This will depend on your motherboard BIOS. For my ASUS board I followed this tutorial.

After this run lspci -nnk | grep -A 2 0b:00.0, replacing your IOMMU group and check that it says Kernel driver in use: vfio-pci. If it says Kernel driver in use: amdgpu the host is still initializing the GPU and it will not pass through.

If everything looks right you can create a VM. Machine type needs to be q35 and BIOS OMVF.
Pass through your PCI device either using the GUI or the console command qm set VMID -hostpci0 IOMMU,pcie=1, make sure you have PCI-Express enabled. I only pass through the 0b:00.0 device, if you need the other functions as well pass through the whole device.

I think I also had to disable secure boot in the VM deleting the Platform Keys in the VM BIOS which you can enter using F2 during boot.

Install your guest OS.
On the guest run lspci -nnk | grep -A 2 VGA, you should see your GPUs like:
Code:
00:01.0 VGA compatible controller [0300]: Device [1234:1111] (rev 02)
        Subsystem: Red Hat, Inc. Device [1af4:1100]
        Kernel driver in use: bochs-drm
--
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c9)
        Subsystem: ASUSTeK Computer Inc. Renoir [1043:87e1]
        Kernel driver in use: amdgpu
If it says Kernel driver in use: amdgpu you should have a working GPU in the VM. If it only says Kernel modules: amdgpu the driver is not loading correctly. Check the output of sudo dmesg | grep amdgpu to see why it fails.

Your Guest should have at least Kernel 5.11.32.21.40 for the amdgpu module to be included. My guest is on Ubuntu 22.10 with the 5.19.0-26 Kernel but earlier should work too.

You can also check whether everything works successfully by running ls /dev/dri which should show card1 and renderD128

I don't remember anything else you should need to do.

Hello,
i followed all the instructions and mostly i had success but something is broken on guest driver:

sudo dmesg | grep amdgpu [ 3.448523] [drm] amdgpu kernel modesetting enabled. [ 3.448529] [drm] amdgpu version: 6.2.4 [ 3.449254] amdgpu: CRAT table not found [ 3.449258] amdgpu: Virtual CRAT table created for CPU [ 3.449269] amdgpu: Topology: Add CPU node [ 3.466645] amdgpu: PeerDirect support was initialized successfully [ 3.507238] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR [ 3.507241] amdgpu: ATOM BIOS: 113-RENOIR-034 [ 3.507459] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled [ 3.507465] amdgpu 0000:01:00.0: amdgpu: MODE2 reset [ 7.793948] amdgpu 0000:01:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000002 [ 7.794012] amdgpu 0000:01:00.0: amdgpu: Mode2 reset failed! [ 7.794041] amdgpu 0000:01:00.0: amdgpu: asic reset on init failed [ 7.794071] amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init [ 7.794366] amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing dev

Any ideas?
 
Last edited:
AsRock X300 with a 4650G.
I gave it a shot but couldnt get it to recognize in the VM (no PCI device listed). From the host side everything seems fine (just vendor-reset not working). All the outputs and checks are working fine on the host side.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!