Hi all,
like a couple of others I have been trying to get GPU passthrough working with a Ryzen APU but now I'm stuck.
First things first - I have no idea what I'm doing. I'm fueled by the naivety of the young and 500 open browser tabs so please go easy on me.
I think I have the basic passthrough working but I'm having trouble getting the amdgpu driver to load on the guest system.
My host is:
Ryzen 5 4600G
Asus ROG B550-A Gaming
32GB Samsung M391A4G43MB1-CTDQ
This is my vm.conf file:
So the host ignores the driver successfully.
So the passthrough looks good to me, except the driver is not getting used.
This is an error I got before enabling pcie on the device I'm leaving here in case anyone else is stuck here:
I also had to disable secureboot on the host as well as on the guest to get this far.
Now I get the following output for
From what I've gathered the MODE2 thing seems to be an APU thing. But I've seen people come further than this so at this point I'm not sure whether to continue troubleshooting the host or the guest.
In this thread they seem to get past the MODE2 reset fine, and the amdgpu driver gets shown as in use: https://gitlab.freedesktop.org/drm/amd/-/issues/2046
I've also seen the driver getting loaded for some other people, but there is conflicting as to which kernel options are needed on the host such as nomodset, whether amd_iommu and other options are needed at all and so on. For completion sake here are my settings on the host:
My guest is running Ubuntu 22.10 with the 5.19.0-26 kernel. I've also tried 22.04 with the 5.15 kernel before without success.
I've also tried the default amdgpu driver and the one installed using the amdgpu-install script from the AMD website.
Now I'm kind of stuck as to where to troubleshoot... The device seems to pass through ok so I'm suspecting it's an issue with the guest but also some people seemed to get farther than me so it could still be a setting on the host.
Any advice is appreciated.
like a couple of others I have been trying to get GPU passthrough working with a Ryzen APU but now I'm stuck.
First things first - I have no idea what I'm doing. I'm fueled by the naivety of the young and 500 open browser tabs so please go easy on me.
I think I have the basic passthrough working but I'm having trouble getting the amdgpu driver to load on the guest system.
My host is:
Ryzen 5 4600G
Asus ROG B550-A Gaming
32GB Samsung M391A4G43MB1-CTDQ
This is my vm.conf file:
Code:
agent: 1
bios: ovmf
boot: order=scsi0;net0
cores: 6
efidisk0: local:102/vm-102-disk-1.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:0b:00.0,pcie=1
machine: q35
memory: 6144
meta: creation-qemu=6.1.1,ctime=1668423230
name: ornn
net0: virtio=7E:16:4E:A6:67:30,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
parent: install
scsi0: local:102/vm-102-disk-0.qcow2,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=b28abe58-882e-475e-ae88-1460a426df3d
sockets: 1
vmgenid: 4c6b53bd-e771-4011-8997-6a86a3911754
lspci -nnk
on the host looks like this:
Code:
0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c9)
Subsystem: ASUSTeK Computer Inc. Renoir [1043:87e1]
Kernel driver in use: vfio-pci
Kernel modules: amdgpu
So the host ignores the driver successfully.
lspci -nnk
on the guest looks like this:
Code:
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c9)
Subsystem: ASUSTeK Computer Inc. Renoir [1043:87e1]
Kernel modules: amdgpu
So the passthrough looks good to me, except the driver is not getting used.
This is an error I got before enabling pcie on the device I'm leaving here in case anyone else is stuck here:
Code:
[ 2.423230] amdgpu 0000:01:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[ 2.426412] amdgpu 0000:01:00.0: amdgpu: Unable to locate a BIOS ROM
[ 2.426417] amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
[ 2.426420] amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
[ 2.427239] amdgpu: probe of 0000:01:00.0 failed with error -22
I also had to disable secureboot on the host as well as on the guest to get this far.
Now I get the following output for
sudo dmesg | grep amdgpu
Code:
[ 3.163064] [drm] amdgpu kernel modesetting enabled.
[ 3.163086] [drm] amdgpu version: 5.18.13
[ 3.163931] amdgpu: CRAT table not found
[ 3.163961] amdgpu: Virtual CRAT table created for CPU
[ 3.163984] amdgpu: Topology: Add CPU node
[ 3.184178] amdgpu: PeerDirect support was initialized successfully
[ 3.228494] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 3.229023] amdgpu: ATOM BIOS: 113-RENOIR-034
[ 3.230660] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[ 3.231094] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
[ 3.231516] amdgpu 0000:01:00.0: amdgpu: MODE2 reset
[ 7.514970] amdgpu 0000:01:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000002
[ 7.515358] amdgpu 0000:01:00.0: amdgpu: Mode2 reset failed!
[ 7.515652] amdgpu 0000:01:00.0: amdgpu: asic reset on init failed
[ 7.515908] amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
[ 7.516396] amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
[ 7.517976] amdgpu: probe of 0000:01:00.0 failed with error -62
From what I've gathered the MODE2 thing seems to be an APU thing. But I've seen people come further than this so at this point I'm not sure whether to continue troubleshooting the host or the guest.
In this thread they seem to get past the MODE2 reset fine, and the amdgpu driver gets shown as in use: https://gitlab.freedesktop.org/drm/amd/-/issues/2046
I've also seen the driver getting loaded for some other people, but there is conflicting as to which kernel options are needed on the host such as nomodset, whether amd_iommu and other options are needed at all and so on. For completion sake here are my settings on the host:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on pcie_acs_override=downstream,multifunction video=efifb:off"
My guest is running Ubuntu 22.10 with the 5.19.0-26 kernel. I've also tried 22.04 with the 5.15 kernel before without success.
I've also tried the default amdgpu driver and the one installed using the amdgpu-install script from the AMD website.
Now I'm kind of stuck as to where to troubleshoot... The device seems to pass through ok so I'm suspecting it's an issue with the guest but also some people seemed to get farther than me so it could still be a setting on the host.
Any advice is appreciated.