AMD GPU Passthrough - PCIE Atomics not working in Linux (Ubuntu 23.10) guest

crakej

New Member
Oct 16, 2023
17
2
3
Hi

I've spent some time looking into this problem so would be grateful of any help!

I have a Dell Poweredge T430 running PVE 8.1. I have an AMD Instinct MI25 (Vega10) installed and configured for passthrough. The guest is Ubuntu 23.10. Trying to install (for example) Stable Diffusion. No matter what I do I keep getting the error:

Code:
john@ubuntutest:~$ sudo dmesg | grep atomic
[    0.499676] DMA: preallocated 2048 KiB GFP_KERNEL pool for atomic allocations
[    0.499949] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.500215] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[   17.430475] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported

Which prevents rocm from running.

I tried editing the pcie device in the conf file to enable atomics, but it doesn't like that.....

Code:
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 16
cpu: x86-64-v2-AES
efidisk0: zfs-local:200/vm-200-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:04:00,pcie=1,atomic=1
machine: q35
memory: 16384
meta: creation-qemu=8.1.2,ctime=1700759584
name: Ubuntu2310rocm
net0: virtio=BC:24:11:FC:60:C5,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: LocalStorageZ:vm-200-disk-0,iothread=1,size=50G
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=46780e1f-409b-4513-aba0-3dc8d70c8ba4
sockets: 1
vga: memory=64
vmgenid: c7f4565b-4f00-4b0d-a3c1-335f2a5c6cfb

Which results in
Code:
vm 200 - unable to parse value of 'hostpci0' - format error
atomic: property is not defined in schema and the schema does not allow additional properties
starting serial terminal on interface serial0
BdsDxe: loading Boot0004 "ubuntu" from HD(1,GPT,29436749-2657-4AAD-B865-1379970CC259,0x800,0x219800)/\EFI\ubuntu\shimx64.efi

and there being no gpu in the VM at all!

My CPUs support atomics, and the MI25 requires it. Atomics are working on the host just fine.

Any ideas? I have tried a few other things, but don't seem to be able to find any concrete info on this. Many thanks!
 
Last edited:
So I'm still stuck with this problem. I thought it might be useful to list links i'm looking at. The strike-through are the ones that haven't helped thus far. The driver loads fine, the card is there, but still no atomics :(

https://forum.proxmox.com/threads/gpu-passthrough-ryzen-4600g-apu.120151/
https://forum.proxmox.com/threads/gpu-pass-through-with-radeon-vii.125733/
https://forum.proxmox.com/threads/gpu-passthrough-radeon-6800xt-and-beyond.86932/#post-606373
https://www.wundertech.net/how-to-set-up-gpu-passthrough-on-proxmox/
https://pve.proxmox.com/wiki/PCI_Passthrough#GPU_passthrough


Now, the card passes through, but no atomics. Been working from these 2 sources as it seems it is now supported in PVE, but I have never edited my kernel before which is apparently needed still to get it working. The card shows as being capable though.

https://patchew.org/QEMU/20230420153839.167418-1-robin@streamhpc.com/
https://forum.level1techs.com/t/asu...-6900-xt-works-on-host-but-not-in-vm/174245/4

1701777939070.png

Code:
BdsDxe: loading Boot0008 "ubuntu" from HD(1,GPT,29F6C2D7-4123-485D-85CF-CABB86F4E4B8,0x800,0x219800)/\EFI\ubuntu\shimx64.efi
BdsDxe: starting Boot0008 "ubuntu" from HD(1,GPT,29F6C2D7-4123-485D-85CF-CABB86F4E4B8,0x800,0x219800)/\EFI\ubuntu\shimx64.efi
error: no suitable video mode found.
[    2.044976] shpchp 0000:05:01.0: pci_hp_register failed with error -16
[    2.046296] shpchp 0000:05:01.0: Slot initialization failed
[    2.050907] shpchp 0000:05:02.0: pci_hp_register failed with error -16
[    2.052202] shpchp 0000:05:02.0: Slot initialization failed
[    2.057595] shpchp 0000:05:03.0: pci_hp_register failed with error -16
[    2.058884] shpchp 0000:05:03.0: Slot initialization failed
[    2.063429] shpchp 0000:05:04.0: pci_hp_register failed with error -16
[    2.064711] shpchp 0000:05:04.0: Slot initialization failed
/dev/sda2: clean, 161029/4128768 files, 5693198/16501504 blocks
[    9.079078] snd_hda_intel 0000:00:1b.0: no codecs found!
[    9.616385] cloud-init[681]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 running 'init-local' at Tue, 05 Dec 2023 12:10:21 +0000. Up 9.58 seconds.
[   11.879001] cloud-init[689]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 running 'init' at Tue, 05 Dec 2023 12:10:24 +0000. Up 11.85 seconds.
[   11.891839] cloud-init[689]: ci-info: +++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++
[   11.894134] cloud-init[689]: ci-info: +---------+------+------------------------------+---------------+--------+-------------------+
[   11.896414] cloud-init[689]: ci-info: |  Device |  Up  |           Address            |      Mask     | Scope  |     Hw-Address    |
[   11.898672] cloud-init[689]: ci-info: +---------+------+------------------------------+---------------+--------+-------------------+
[   11.900913] cloud-init[689]: ci-info: | enp6s18 | True |         192.168.0.53         | 255.255.255.0 | global | 3a:25:4d:6f:9f:cb |
[   11.903171] cloud-init[689]: ci-info: | enp6s18 | True | fe80::3825:4dff:fe6f:9fcb/64 |       .       |  link  | 3a:25:4d:6f:9f:cb |
[   11.905441] cloud-init[689]: ci-info: |    lo   | True |          127.0.0.1           |   255.0.0.0   |  host  |         .         |
[   11.907698] cloud-init[689]: ci-info: |    lo   | True |           ::1/128            |       .       |  host  |         .         |
[   11.909961] cloud-init[689]: ci-info: +---------+------+------------------------------+---------------+--------+-------------------+
[   11.912236] cloud-init[689]: ci-info: +++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++
[   11.914232] cloud-init[689]: ci-info: +-------+---------------+-------------+-----------------+-----------+-------+
[   11.916211] cloud-init[689]: ci-info: | Route |  Destination  |   Gateway   |     Genmask     | Interface | Flags |
[   11.918201] cloud-init[689]: ci-info: +-------+---------------+-------------+-----------------+-----------+-------+
[   11.920227] cloud-init[689]: ci-info: |   0   |    0.0.0.0    | 192.168.0.1 |     0.0.0.0     |  enp6s18  |   UG  |
[   11.922222] cloud-init[689]: ci-info: |   1   |  192.168.0.0  |   0.0.0.0   |  255.255.255.0  |  enp6s18  |   U   |
[   11.924206] cloud-init[689]: ci-info: |   2   |  192.168.0.1  |   0.0.0.0   | 255.255.255.255 |  enp6s18  |   UH  |
[   11.926200] cloud-init[689]: ci-info: |   3   | 194.168.4.100 | 192.168.0.1 | 255.255.255.255 |  enp6s18  |  UGH  |
[   11.928190] cloud-init[689]: ci-info: |   4   | 194.168.8.100 | 192.168.0.1 | 255.255.255.255 |  enp6s18  |  UGH  |
[   11.930193] cloud-init[689]: ci-info: +-------+---------------+-------------+-----------------+-----------+-------+
[   11.932182] cloud-init[689]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[   11.933777] cloud-init[689]: ci-info: +-------+-------------+---------+-----------+-------+
[   11.935379] cloud-init[689]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[   11.937006] cloud-init[689]: ci-info: +-------+-------------+---------+-----------+-------+
[   11.938630] cloud-init[689]: ci-info: |   1   |  fe80::/64  |    ::   |  enp6s18  |   U   |
[   11.940257] cloud-init[689]: ci-info: |   3   |    local    |    ::   |  enp6s18  |   U   |
[   11.941874] cloud-init[689]: ci-info: |   4   |  multicast  |    ::   |  enp6s18  |   U   |
[   11.943463] cloud-init[689]: ci-info: +-------+-------------+---------+-----------+-------+
[   18.260811] cloud-init[1271]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 running 'modules:config' at Tue, 05 Dec 2023 12:10:30 +0000. Up 18.20 seconds.
[   18.768355] cloud-init[1283]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 running 'modules:final' at Tue, 05 Dec 2023 12:10:31 +0000. Up 18.71 seconds.
[   18.837579] cloud-init[1283]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 finished at Tue, 05 Dec 2023 12:10:31 +0000. Datasource DataSourceNone.  Up 18.83 seconds
[   18.840094] cloud-init[1283]: 2023-12-05 12:10:31,192 - cc_final_message.py[WARNING]: Used fallback datasource

Ubuntu 22.04.3 LTS docker ttyS0

docker login:

Code:
john@docker:~$ sudo dmesg | grep -e amd
[    0.000000] Linux version 5.15.0-89-generic (buildd@bos03-amd64-016) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #99-Ubuntu SMP Mon Oct 30 20:42:41 UTC 2023 (Ubuntu 5.15.0-89.99-generic 5.15.126)
[    3.802091] amdkcl: loading out-of-tree module taints kernel.
[    3.871239] amdkcl: Warning: fail to get symbol __cancel_work, replace it with kcl stub
[    4.213956] [drm] amdgpu kernel modesetting enabled.
[    4.213961] [drm] amdgpu version: 6.2.4
[    4.214175] amdgpu: CRAT table not found
[    4.214180] amdgpu: Virtual CRAT table created for CPU
[    4.214193] amdgpu: Topology: Add CPU node
[    4.230956] amdgpu: PeerDirect support was initialized successfully
[    4.278418] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    4.278423] amdgpu: ATOM BIOS: 113-D0513700-001
[    4.279127] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
[    4.279135] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    4.279154] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
[    5.338634] amdgpu 0000:01:00.0: amdgpu: MEM ECC is active.
[    5.338639] amdgpu 0000:01:00.0: amdgpu: SRAM ECC is not presented.
[    5.338671] amdgpu 0000:01:00.0: amdgpu: VRAM: 16368M 0x000000F400000000 - 0x000000F7FEFFFFFF (16368M used)
[    5.338674] amdgpu 0000:01:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    5.338677] amdgpu 0000:01:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
[    5.338749] [drm] amdgpu: 16368M of VRAM memory ready
[    5.338751] [drm] amdgpu: 7975M of GTT memory ready.
[    5.358147] amdgpu: [powerplay] hwmgr_sw_init smu backed is vega10_smu
[    6.008031] amdgpu: HMM registered 16368MB device memory
[    6.028616] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    6.028635] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[    6.028833] amdgpu: Virtual CRAT table created for GPU
[    6.029094] amdgpu: Topology: Add dGPU node [0x6860:0x1002]
[    6.029098] kfd kfd: amdgpu: added device 1002:6860
[    6.029114] amdgpu 0000:01:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 16, active_cu_number 64
[    6.029121] amdgpu 0000:01:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[    6.029124] amdgpu 0000:01:00.0: amdgpu: ring gfx_low uses VM inv eng 1 on hub 0
[    6.029125] amdgpu 0000:01:00.0: amdgpu: ring gfx_high uses VM inv eng 4 on hub 0
[    6.029127] amdgpu 0000:01:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 5 on hub 0
[    6.029129] amdgpu 0000:01:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 6 on hub 0
[    6.029138] amdgpu 0000:01:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 7 on hub 0
[    6.029139] amdgpu 0000:01:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 8 on hub 0
[    6.029141] amdgpu 0000:01:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 9 on hub 0
[    6.029143] amdgpu 0000:01:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 10 on hub 0
[    6.029145] amdgpu 0000:01:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 11 on hub 0
[    6.029146] amdgpu 0000:01:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 12 on hub 0
[    6.029148] amdgpu 0000:01:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 13 on hub 0
[    6.029149] amdgpu 0000:01:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
[    6.029152] amdgpu 0000:01:00.0: amdgpu: ring page0 uses VM inv eng 1 on hub 8
[    6.029153] amdgpu 0000:01:00.0: amdgpu: ring sdma1 uses VM inv eng 4 on hub 8
[    6.029155] amdgpu 0000:01:00.0: amdgpu: ring page1 uses VM inv eng 5 on hub 8
[    6.029156] amdgpu 0000:01:00.0: amdgpu: ring uvd_0 uses VM inv eng 6 on hub 8
[    6.029158] amdgpu 0000:01:00.0: amdgpu: ring uvd_enc_0.0 uses VM inv eng 7 on hub 8
[    6.029159] amdgpu 0000:01:00.0: amdgpu: ring uvd_enc_0.1 uses VM inv eng 8 on hub 8
[    6.029161] amdgpu 0000:01:00.0: amdgpu: ring vce0 uses VM inv eng 9 on hub 8
[    6.029162] amdgpu 0000:01:00.0: amdgpu: ring vce1 uses VM inv eng 10 on hub 8
[    6.029164] amdgpu 0000:01:00.0: amdgpu: ring vce2 uses VM inv eng 11 on hub 8
[    6.031355] amdgpu: legacy kernel without apple_gmux_detect()
[    6.031609] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:01:00.0 on minor 0
john@docker:~$ sudo dmesg | grep -e atomic
[    0.277740] DMA: preallocated 2048 KiB GFP_KERNEL pool for atomic allocations
[    0.278161] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.278576] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    4.279154] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
john@docker:~$
 
Last edited:
Spent a few more hours on this today but not got anywhere!

I installed Ubuntu on an USB drive to test my bare metal system, which worked as expected, so this isn't a hardware limitation...
 
I've been trying to install drivers, yet with the latest kernel it fails to compile the modules.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!