Help to check AMD xdna-driver installation

joebnb

New Member
Jun 3, 2024
4
0
1
intel and amd released thier NPU included CPU named Core Ultra and Zen AI.

I'm trying to use the GPU and NPU in AMD's new generation CPUs on a VM. But some problem occur.i did the pci passthrough using lspci check device. and i sent some debug logs to amd driver repo, the reply is here,please help to figure out is it can be done or what should i do.

issue logs:https://github.com/amd/xdna-driver/issues/168
 
I'm not sure what you expect from us. A developer of the driver gave you a clear answer in both the issues that you created, and it clearly states that this use case is not supported yet. Specifically because AMD does not ship NPUs on their server level CPUs yet.

I'll just link to the second issue you created later here too, as it makes it fairly obvious why this use case has not been a priority yet: https://github.com/amd/xdna-driver/issues/178
 
PASID and iommu seems more edge,i not sure why my VM not support PASID,maybe it'a hardware problem may is PVE,so i ask in here for give some explain,and i think in that issue the view is system driver,and in here maybe i will get some VM side suggest
 
PASID and iommu seems more edge,i not sure why my VM not support PASID,maybe it'a hardware problem may is PVE,so i ask in here for give some explain,and i think in that issue the view is system driver,and in here maybe i will get some VM side suggest
Not sure what you mean by “edge”. Anyway, the problem isn't about your hardware or host support vor IOMMU/PASID, but rather that your VM does not seem to provide that. At least that's what I gather from the GitHub Issue [1].

You can try to enable vIOMMU (which emulates IOMMU for your VM), as outlined here [2]. If you want any further help, please provide a more detailed description of what you need and also post the config of the VM in question.

[1]: https://github.com/amd/xdna-driver/issues/168#issuecomment-2240080903
[2]: https://pve.proxmox.com/wiki/PCI(e)_Passthrough#qm_pci_viommu
 
  • Like
Reactions: joebnb
Thank you for the explanation. It seems the IOMMU configuration, but there still appear to be problems. Could you please provide a more detailed analysis and help optimize the remaining issues?

The VM error is:
Code:
sudo dmesg | grep -E "amd|gpu|xdna|xrt|error|fail|err|critical|not"
[    0.042419] Speculative Return Stack Overflow: IBPB-extending microcode not applied!
[    0.112229] acpi PNP0A08:00: _OSC: platform does not support [PCIeHotplug LTR DPC]
[    0.284560] pci_bus 0000:05: extended config space not accessible
[    0.354014] pci_bus 0000:06: extended config space not accessible
[    0.500833] pci_bus 0000:07: extended config space not accessible
[    0.508083] pci_bus 0000:08: extended config space not accessible
[    0.515212] pci_bus 0000:09: extended config space not accessible
[    0.787358] shpchp 0000:05:01.0: pci_hp_register failed with error -16
[    0.789463] shpchp 0000:05:02.0: pci_hp_register failed with error -16
[    0.791535] shpchp 0000:05:03.0: pci_hp_register failed with error -16
[    0.794462] shpchp 0000:05:04.0: pci_hp_register failed with error -16
[    0.878849] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[    0.878939] platform eisa.0: EISA: Cannot allocate resource for mainboard
[    0.878942] platform eisa.0: Cannot allocate resource for EISA slot 1
[    0.878946] platform eisa.0: Cannot allocate resource for EISA slot 2
[    0.878949] platform eisa.0: Cannot allocate resource for EISA slot 3
[    0.878952] platform eisa.0: Cannot allocate resource for EISA slot 4
[    0.878955] platform eisa.0: Cannot allocate resource for EISA slot 5
[    0.878958] platform eisa.0: Cannot allocate resource for EISA slot 6
[    0.878962] platform eisa.0: Cannot allocate resource for EISA slot 7
[    0.878965] platform eisa.0: Cannot allocate resource for EISA slot 8
[    0.878971] amd_pstate: the _CPC object is not present in SBIOS or ACPI disabled
[    2.412862] amdxdna: loading out-of-tree module taints kernel.
[    2.412870] amdxdna: module verification failed: signature and/or required key missing - tainting kernel
[    2.429239] i2c i2c-0: Memory type 0x07 not supported yet, not instantiating SPD
[    2.454007] amdxdna 0000:06:1b.0: aie2_init: Enable PASID failed, ret -19
[    2.455980] amdxdna 0000:06:1b.0: amdxdna_probe: Hardware init failed, ret -19
[    2.505107] kvm_amd: TSC scaling supported
[    2.505112] kvm_amd: Nested Virtualization enabled
[    2.505114] kvm_amd: Nested Paging enabled
[    2.505117] kvm_amd: LBR virtualization supported
[    2.505123] kvm_amd: Virtual VMLOAD VMSAVE supported
[    2.505124] kvm_amd: Virtual GIF supported
[    2.505124] kvm_amd: Virtual NMI enabled
[    3.952671] [drm] amdgpu kernel modesetting enabled.
[    3.952823] amdgpu: Virtual CRAT table created for CPU
[    3.952838] amdgpu: Topology: Add CPU node
[    3.963399] amdgpu 0000:06:10.0: ROM [??? 0x00000000 flags 0x20000000]: can't assign; bogus alignment
[    3.968594] amdgpu 0000:06:10.0: amdgpu: Unable to locate a BIOS ROM
[    3.968939] amdgpu 0000:06:10.0: amdgpu: Fatal error during GPU init
[    3.969258] amdgpu 0000:06:10.0: amdgpu: amdgpu: finishing device.
[    3.970220] amdgpu 0000:06:10.0: probe with driver amdgpu failed with error -22
[  140.276049] amdxdna 0000:06:1b.0: aie2_init: Enable PASID failed, ret -19
[  140.278123] amdxdna 0000:06:1b.0: amdxdna_probe: Hardware init failed, ret -19

Upper logs i can filter
[ 2.412870] amdxdna: module verification failed: signature and/or required key missing - tainting kernel
via: https://github.com/amd/xdna-driver/issues/14#issuecomment-1939359655


This is my VM configuration

Code:
root@pve:/etc/pve/qemu-server# cat 102.conf
bios: ovmf
boot: order=scsi0;net0
cores: 8
cpu: host
efidisk0: local-lvm:vm-102-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:c6:00.0
hostpci1: 0000:c6:00.1
hostpci2: 0000:c7:00.1
machine: q35,viommu=intel
memory: 8192
meta: creation-qemu=9.0.0,ctime=1721230408
name: UbuntuAI
net0: virtio=BC:24:11:11:B4:D0,bridge=vmbr0
numa: 1
ostype: l26
scsi0: local-lvm:vm-102-disk-1,iothread=1,size=240G,ssd=1
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=52892409-f8ed-4921-a5ca-2a73622451e1
sockets: 1
vmgenid: 4da0d14a-c5ef-466a-913a-a5115511e56e
root@pve:/etc/pve/qemu-server#

This is Host PCI devices:
Code:
root@pve:/etc/pve/qemu-server# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14e8
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 14e9
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14ea
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14ed
00:01.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14ed
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14ea
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14ee
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14ee
00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14ee
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14ea
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14ea
00:04.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14ea
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14eb
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14eb
00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14eb
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 71)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14f0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14f1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14f2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14f3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14f4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14f5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14f6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14f7
01:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. NV2 NVMe SSD SM2267XT (DRAM-less) (rev 03)
02:00.0 USB controller: ASMedia Technology Inc. ASM2142/ASM3142 USB 3.1 Host Controller
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
04:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8852BE PCIe 802.11ax Wireless Network Controller
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
c6:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Phoenix3 (rev c5)
c6:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller
c6:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 19h (Model 74h) CCP/PSP 3.0 Device
c6:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15b9
c6:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15ba
c6:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller
c7:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 14ec
c7:00.1 Signal processing controller: Advanced Micro Devices, Inc. [AMD] AMD IPU Device
c8:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 14ec
c8:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15c0
c8:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15c1
c8:00.5 USB controller: Advanced Micro Devices, Inc. [AMD] Pink Sardine USB4/Thunderbolt NHI controller
c8:00.6 USB controller: Advanced Micro Devices, Inc. [AMD] Pink Sardine USB4/Thunderbolt NHI controller
root@pve:/etc/pve/qemu-server#

This is VM PCI devices:
Code:
lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:02.0 Unclassified device [00ff]: Red Hat, Inc. Device 1057 (rev 01)
00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1c.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1c.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1c.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
05:01.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
05:02.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
05:03.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
05:04.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
06:03.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
06:10.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Phoenix3 (rev c5)
06:11.0 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller
06:12.0 Ethernet controller: Red Hat, Inc. Virtio network device
06:1b.0 Signal processing controller: Advanced Micro Devices, Inc. [AMD] AMD IPU Device
09:01.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!