AMD GPU Passthrough - PCIE Atomics not working in Linux (Ubuntu 23.10) guest

crakej

New Member
Oct 16, 2023
22
2
3
Hi

I've spent some time looking into this problem so would be grateful of any help!

I have a Dell Poweredge T430 running PVE 8.1. I have an AMD Instinct MI25 (Vega10) installed and configured for passthrough. The guest is Ubuntu 23.10. Trying to install (for example) Stable Diffusion. No matter what I do I keep getting the error:

Code:
john@ubuntutest:~$ sudo dmesg | grep atomic
[    0.499676] DMA: preallocated 2048 KiB GFP_KERNEL pool for atomic allocations
[    0.499949] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.500215] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[   17.430475] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported

Which prevents rocm from running.

I tried editing the pcie device in the conf file to enable atomics, but it doesn't like that.....

Code:
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 16
cpu: x86-64-v2-AES
efidisk0: zfs-local:200/vm-200-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:04:00,pcie=1,atomic=1
machine: q35
memory: 16384
meta: creation-qemu=8.1.2,ctime=1700759584
name: Ubuntu2310rocm
net0: virtio=BC:24:11:FC:60:C5,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: LocalStorageZ:vm-200-disk-0,iothread=1,size=50G
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=46780e1f-409b-4513-aba0-3dc8d70c8ba4
sockets: 1
vga: memory=64
vmgenid: c7f4565b-4f00-4b0d-a3c1-335f2a5c6cfb

Which results in
Code:
vm 200 - unable to parse value of 'hostpci0' - format error
atomic: property is not defined in schema and the schema does not allow additional properties
starting serial terminal on interface serial0
BdsDxe: loading Boot0004 "ubuntu" from HD(1,GPT,29436749-2657-4AAD-B865-1379970CC259,0x800,0x219800)/\EFI\ubuntu\shimx64.efi

and there being no gpu in the VM at all!

My CPUs support atomics, and the MI25 requires it. Atomics are working on the host just fine.

Any ideas? I have tried a few other things, but don't seem to be able to find any concrete info on this. Many thanks!
 
Last edited:
So I'm still stuck with this problem. I thought it might be useful to list links i'm looking at. The strike-through are the ones that haven't helped thus far. The driver loads fine, the card is there, but still no atomics :(

https://forum.proxmox.com/threads/gpu-passthrough-ryzen-4600g-apu.120151/
https://forum.proxmox.com/threads/gpu-pass-through-with-radeon-vii.125733/
https://forum.proxmox.com/threads/gpu-passthrough-radeon-6800xt-and-beyond.86932/#post-606373
https://www.wundertech.net/how-to-set-up-gpu-passthrough-on-proxmox/
https://pve.proxmox.com/wiki/PCI_Passthrough#GPU_passthrough


Now, the card passes through, but no atomics. Been working from these 2 sources as it seems it is now supported in PVE, but I have never edited my kernel before which is apparently needed still to get it working. The card shows as being capable though.

https://patchew.org/QEMU/20230420153839.167418-1-robin@streamhpc.com/
https://forum.level1techs.com/t/asu...-6900-xt-works-on-host-but-not-in-vm/174245/4

1701777939070.png

Code:
BdsDxe: loading Boot0008 "ubuntu" from HD(1,GPT,29F6C2D7-4123-485D-85CF-CABB86F4E4B8,0x800,0x219800)/\EFI\ubuntu\shimx64.efi
BdsDxe: starting Boot0008 "ubuntu" from HD(1,GPT,29F6C2D7-4123-485D-85CF-CABB86F4E4B8,0x800,0x219800)/\EFI\ubuntu\shimx64.efi
error: no suitable video mode found.
[    2.044976] shpchp 0000:05:01.0: pci_hp_register failed with error -16
[    2.046296] shpchp 0000:05:01.0: Slot initialization failed
[    2.050907] shpchp 0000:05:02.0: pci_hp_register failed with error -16
[    2.052202] shpchp 0000:05:02.0: Slot initialization failed
[    2.057595] shpchp 0000:05:03.0: pci_hp_register failed with error -16
[    2.058884] shpchp 0000:05:03.0: Slot initialization failed
[    2.063429] shpchp 0000:05:04.0: pci_hp_register failed with error -16
[    2.064711] shpchp 0000:05:04.0: Slot initialization failed
/dev/sda2: clean, 161029/4128768 files, 5693198/16501504 blocks
[    9.079078] snd_hda_intel 0000:00:1b.0: no codecs found!
[    9.616385] cloud-init[681]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 running 'init-local' at Tue, 05 Dec 2023 12:10:21 +0000. Up 9.58 seconds.
[   11.879001] cloud-init[689]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 running 'init' at Tue, 05 Dec 2023 12:10:24 +0000. Up 11.85 seconds.
[   11.891839] cloud-init[689]: ci-info: +++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++
[   11.894134] cloud-init[689]: ci-info: +---------+------+------------------------------+---------------+--------+-------------------+
[   11.896414] cloud-init[689]: ci-info: |  Device |  Up  |           Address            |      Mask     | Scope  |     Hw-Address    |
[   11.898672] cloud-init[689]: ci-info: +---------+------+------------------------------+---------------+--------+-------------------+
[   11.900913] cloud-init[689]: ci-info: | enp6s18 | True |         192.168.0.53         | 255.255.255.0 | global | 3a:25:4d:6f:9f:cb |
[   11.903171] cloud-init[689]: ci-info: | enp6s18 | True | fe80::3825:4dff:fe6f:9fcb/64 |       .       |  link  | 3a:25:4d:6f:9f:cb |
[   11.905441] cloud-init[689]: ci-info: |    lo   | True |          127.0.0.1           |   255.0.0.0   |  host  |         .         |
[   11.907698] cloud-init[689]: ci-info: |    lo   | True |           ::1/128            |       .       |  host  |         .         |
[   11.909961] cloud-init[689]: ci-info: +---------+------+------------------------------+---------------+--------+-------------------+
[   11.912236] cloud-init[689]: ci-info: +++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++
[   11.914232] cloud-init[689]: ci-info: +-------+---------------+-------------+-----------------+-----------+-------+
[   11.916211] cloud-init[689]: ci-info: | Route |  Destination  |   Gateway   |     Genmask     | Interface | Flags |
[   11.918201] cloud-init[689]: ci-info: +-------+---------------+-------------+-----------------+-----------+-------+
[   11.920227] cloud-init[689]: ci-info: |   0   |    0.0.0.0    | 192.168.0.1 |     0.0.0.0     |  enp6s18  |   UG  |
[   11.922222] cloud-init[689]: ci-info: |   1   |  192.168.0.0  |   0.0.0.0   |  255.255.255.0  |  enp6s18  |   U   |
[   11.924206] cloud-init[689]: ci-info: |   2   |  192.168.0.1  |   0.0.0.0   | 255.255.255.255 |  enp6s18  |   UH  |
[   11.926200] cloud-init[689]: ci-info: |   3   | 194.168.4.100 | 192.168.0.1 | 255.255.255.255 |  enp6s18  |  UGH  |
[   11.928190] cloud-init[689]: ci-info: |   4   | 194.168.8.100 | 192.168.0.1 | 255.255.255.255 |  enp6s18  |  UGH  |
[   11.930193] cloud-init[689]: ci-info: +-------+---------------+-------------+-----------------+-----------+-------+
[   11.932182] cloud-init[689]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[   11.933777] cloud-init[689]: ci-info: +-------+-------------+---------+-----------+-------+
[   11.935379] cloud-init[689]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[   11.937006] cloud-init[689]: ci-info: +-------+-------------+---------+-----------+-------+
[   11.938630] cloud-init[689]: ci-info: |   1   |  fe80::/64  |    ::   |  enp6s18  |   U   |
[   11.940257] cloud-init[689]: ci-info: |   3   |    local    |    ::   |  enp6s18  |   U   |
[   11.941874] cloud-init[689]: ci-info: |   4   |  multicast  |    ::   |  enp6s18  |   U   |
[   11.943463] cloud-init[689]: ci-info: +-------+-------------+---------+-----------+-------+
[   18.260811] cloud-init[1271]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 running 'modules:config' at Tue, 05 Dec 2023 12:10:30 +0000. Up 18.20 seconds.
[   18.768355] cloud-init[1283]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 running 'modules:final' at Tue, 05 Dec 2023 12:10:31 +0000. Up 18.71 seconds.
[   18.837579] cloud-init[1283]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 finished at Tue, 05 Dec 2023 12:10:31 +0000. Datasource DataSourceNone.  Up 18.83 seconds
[   18.840094] cloud-init[1283]: 2023-12-05 12:10:31,192 - cc_final_message.py[WARNING]: Used fallback datasource

Ubuntu 22.04.3 LTS docker ttyS0

docker login:

Code:
john@docker:~$ sudo dmesg | grep -e amd
[    0.000000] Linux version 5.15.0-89-generic (buildd@bos03-amd64-016) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #99-Ubuntu SMP Mon Oct 30 20:42:41 UTC 2023 (Ubuntu 5.15.0-89.99-generic 5.15.126)
[    3.802091] amdkcl: loading out-of-tree module taints kernel.
[    3.871239] amdkcl: Warning: fail to get symbol __cancel_work, replace it with kcl stub
[    4.213956] [drm] amdgpu kernel modesetting enabled.
[    4.213961] [drm] amdgpu version: 6.2.4
[    4.214175] amdgpu: CRAT table not found
[    4.214180] amdgpu: Virtual CRAT table created for CPU
[    4.214193] amdgpu: Topology: Add CPU node
[    4.230956] amdgpu: PeerDirect support was initialized successfully
[    4.278418] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    4.278423] amdgpu: ATOM BIOS: 113-D0513700-001
[    4.279127] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
[    4.279135] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    4.279154] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
[    5.338634] amdgpu 0000:01:00.0: amdgpu: MEM ECC is active.
[    5.338639] amdgpu 0000:01:00.0: amdgpu: SRAM ECC is not presented.
[    5.338671] amdgpu 0000:01:00.0: amdgpu: VRAM: 16368M 0x000000F400000000 - 0x000000F7FEFFFFFF (16368M used)
[    5.338674] amdgpu 0000:01:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    5.338677] amdgpu 0000:01:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
[    5.338749] [drm] amdgpu: 16368M of VRAM memory ready
[    5.338751] [drm] amdgpu: 7975M of GTT memory ready.
[    5.358147] amdgpu: [powerplay] hwmgr_sw_init smu backed is vega10_smu
[    6.008031] amdgpu: HMM registered 16368MB device memory
[    6.028616] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    6.028635] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[    6.028833] amdgpu: Virtual CRAT table created for GPU
[    6.029094] amdgpu: Topology: Add dGPU node [0x6860:0x1002]
[    6.029098] kfd kfd: amdgpu: added device 1002:6860
[    6.029114] amdgpu 0000:01:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 16, active_cu_number 64
[    6.029121] amdgpu 0000:01:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[    6.029124] amdgpu 0000:01:00.0: amdgpu: ring gfx_low uses VM inv eng 1 on hub 0
[    6.029125] amdgpu 0000:01:00.0: amdgpu: ring gfx_high uses VM inv eng 4 on hub 0
[    6.029127] amdgpu 0000:01:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 5 on hub 0
[    6.029129] amdgpu 0000:01:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 6 on hub 0
[    6.029138] amdgpu 0000:01:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 7 on hub 0
[    6.029139] amdgpu 0000:01:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 8 on hub 0
[    6.029141] amdgpu 0000:01:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 9 on hub 0
[    6.029143] amdgpu 0000:01:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 10 on hub 0
[    6.029145] amdgpu 0000:01:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 11 on hub 0
[    6.029146] amdgpu 0000:01:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 12 on hub 0
[    6.029148] amdgpu 0000:01:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 13 on hub 0
[    6.029149] amdgpu 0000:01:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
[    6.029152] amdgpu 0000:01:00.0: amdgpu: ring page0 uses VM inv eng 1 on hub 8
[    6.029153] amdgpu 0000:01:00.0: amdgpu: ring sdma1 uses VM inv eng 4 on hub 8
[    6.029155] amdgpu 0000:01:00.0: amdgpu: ring page1 uses VM inv eng 5 on hub 8
[    6.029156] amdgpu 0000:01:00.0: amdgpu: ring uvd_0 uses VM inv eng 6 on hub 8
[    6.029158] amdgpu 0000:01:00.0: amdgpu: ring uvd_enc_0.0 uses VM inv eng 7 on hub 8
[    6.029159] amdgpu 0000:01:00.0: amdgpu: ring uvd_enc_0.1 uses VM inv eng 8 on hub 8
[    6.029161] amdgpu 0000:01:00.0: amdgpu: ring vce0 uses VM inv eng 9 on hub 8
[    6.029162] amdgpu 0000:01:00.0: amdgpu: ring vce1 uses VM inv eng 10 on hub 8
[    6.029164] amdgpu 0000:01:00.0: amdgpu: ring vce2 uses VM inv eng 11 on hub 8
[    6.031355] amdgpu: legacy kernel without apple_gmux_detect()
[    6.031609] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:01:00.0 on minor 0
john@docker:~$ sudo dmesg | grep -e atomic
[    0.277740] DMA: preallocated 2048 KiB GFP_KERNEL pool for atomic allocations
[    0.278161] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.278576] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    4.279154] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
john@docker:~$
 
Last edited:
Spent a few more hours on this today but not got anywhere!

I installed Ubuntu on an USB drive to test my bare metal system, which worked as expected, so this isn't a hardware limitation...
 
I've been trying to install drivers, yet with the latest kernel it fails to compile the modules.