I have a AMD RX550 GPU in a Thunderbolt dock that is tested to work perfect with my MinisForum MS-01 (Intel 12900H) under W11 on bare metal.
I have a Erying 13900H D5 motherboard with 64Gb RAM that's working flawless with ProxMox the past 3 months.
I have upgraded my ProxMox-VE to the latest version 8.2.4
When I try to passthrough my AMD-GPU in the thunderbolt dock, it is recognized and I can start Windows 11 and use it for about 5 to 15 seconds but then the screen and the cursor freezes. I observed on the Console screen and in the journal the following:
Jul 18 12:40:58 pve kernel: thunderbolt 0-1: device disconnected
Jul 18 12:40:58 pve kernel: pcieport 0000:0a:00.0: ready 1023ms after resume
Jul 18 12:40:58 pve kernel: pcieport 0000:00:07.0: PME: Spurious native interrupt!
Jul 18 12:41:00 pve kernel: pci_bus 0000:20: Allocating resources
Jul 18 12:41:04 pve kernel: thunderbolt 0-1: new device found, vendor=0x8086 device=0x2
Jul 18 12:41:04 pve kernel: thunderbolt 0-1: Intel Tamales Module 2
Jul 18 12:41:24 pve kernel: thunderbolt 0-1: device disconnected
Jul 18 12:41:24 pve kernel: pcieport 0000:00:07.0: PME: Spurious native interrupt!
Jul 18 12:41:25 pve kernel: pci_bus 0000:20: Allocating resources
Jul 18 12:41:30 pve kernel: thunderbolt 0-1: new device found, vendor=0x8086 device=0x2
Jul 18 12:41:30 pve kernel: thunderbolt 0-1: Intel Tamales Module 2
Jul 18 12:41:50 pve kernel: thunderbolt 0-1: device disconnected
Jul 18 12:41:50 pve kernel: pcieport 0000:00:07.0: PME: Spurious native interrupt!
Jul 18 12:41:51 pve kernel: pci_bus 0000:20: Allocating resources
Jul 18 12:41:56 pve kernel: thunderbolt 0-1: new device found, vendor=0x8086 device=0x2
Jul 18 12:41:56 pve kernel: thunderbolt 0-1: Intel Tamales Module 2
Jul 18 12:42:16 pve kernel: thunderbolt 0-1: device disconnected
Jul 18 12:42:16 pve kernel: pcieport 0000:0a:00.0: ready 1023ms after resume
Jul 18 12:42:16 pve kernel: pcieport 0000:00:07.0: PME: Spurious native interrupt!
Jul 18 12:42:17 pve kernel: pci_bus 0000:20: Allocating resources
Jul 18 12:42:22 pve kernel: thunderbolt 0-1: new device found, vendor=0x8086 device=0x2
Jul 18 12:42:22 pve kernel: thunderbolt 0-1: Intel Tamales Module 2
I don't need to start my W11 virtual machine; It keeps doing this until I disconnect my Thunderbolt GPU.
I followed all the instructions for blacklisting and I have some more outputs if someone can make something from these or give some tips to try out I would be very thankfull.
#cat /etc/modules =>
#cat pve-blacklist.conf =>
#cat/etc/modprobe.d/vfio.conf =>
Outputs:
pvesh get /nodes/pve/hardware/pci --pci-class-blacklist "" => output in attachment
dmesg | grep 'remapping' =>
dmesg | grep -e DMAR -e IOMMU =>
What bothers me in this output:
pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
I cannot find: iommu=on in the output, but according to this article :
Question: Someone can confirm me my IOMMU is actually active?
cat /proc/modules | grep pci =>
1. Blacklisted also the Thunderbolt controller :
2. I tried a tip that said -> Update /etc/kernel/cmdline and add pcie_aspm=off to disable active state power management. => To no avail
3. Still on my todo list: But waiting for my order OnexGPU I am going to test first on my Minisforum MS-01 while still running baremetal W11 before converting also to a ProxMox server to test. Maybe it's a motherboard or bios problem.
4.5.6... Any thoughts, someone????
I have a Erying 13900H D5 motherboard with 64Gb RAM that's working flawless with ProxMox the past 3 months.
I have upgraded my ProxMox-VE to the latest version 8.2.4
When I try to passthrough my AMD-GPU in the thunderbolt dock, it is recognized and I can start Windows 11 and use it for about 5 to 15 seconds but then the screen and the cursor freezes. I observed on the Console screen and in the journal the following:
Jul 18 12:40:58 pve kernel: thunderbolt 0-1: device disconnected
Jul 18 12:40:58 pve kernel: pcieport 0000:0a:00.0: ready 1023ms after resume
Jul 18 12:40:58 pve kernel: pcieport 0000:00:07.0: PME: Spurious native interrupt!
Jul 18 12:41:00 pve kernel: pci_bus 0000:20: Allocating resources
Jul 18 12:41:04 pve kernel: thunderbolt 0-1: new device found, vendor=0x8086 device=0x2
Jul 18 12:41:04 pve kernel: thunderbolt 0-1: Intel Tamales Module 2
Jul 18 12:41:24 pve kernel: thunderbolt 0-1: device disconnected
Jul 18 12:41:24 pve kernel: pcieport 0000:00:07.0: PME: Spurious native interrupt!
Jul 18 12:41:25 pve kernel: pci_bus 0000:20: Allocating resources
Jul 18 12:41:30 pve kernel: thunderbolt 0-1: new device found, vendor=0x8086 device=0x2
Jul 18 12:41:30 pve kernel: thunderbolt 0-1: Intel Tamales Module 2
Jul 18 12:41:50 pve kernel: thunderbolt 0-1: device disconnected
Jul 18 12:41:50 pve kernel: pcieport 0000:00:07.0: PME: Spurious native interrupt!
Jul 18 12:41:51 pve kernel: pci_bus 0000:20: Allocating resources
Jul 18 12:41:56 pve kernel: thunderbolt 0-1: new device found, vendor=0x8086 device=0x2
Jul 18 12:41:56 pve kernel: thunderbolt 0-1: Intel Tamales Module 2
Jul 18 12:42:16 pve kernel: thunderbolt 0-1: device disconnected
Jul 18 12:42:16 pve kernel: pcieport 0000:0a:00.0: ready 1023ms after resume
Jul 18 12:42:16 pve kernel: pcieport 0000:00:07.0: PME: Spurious native interrupt!
Jul 18 12:42:17 pve kernel: pci_bus 0000:20: Allocating resources
Jul 18 12:42:22 pve kernel: thunderbolt 0-1: new device found, vendor=0x8086 device=0x2
Jul 18 12:42:22 pve kernel: thunderbolt 0-1: Intel Tamales Module 2
I don't need to start my W11 virtual machine; It keeps doing this until I disconnect my Thunderbolt GPU.
I followed all the instructions for blacklisting and I have some more outputs if someone can make something from these or give some tips to try out I would be very thankfull.
#cat /etc/modules =>
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
thunderbolt
Code:
blacklist amdgpu
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm
Code:
optionsvfio-pci ids=1002:699f,1002:aae0,8086:15ef,8086:15f0 disable_vga=0
#0c:00.0VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] LexaPRO [Radeon 540/540X/550/550X / RX 540X/550/550X] [1002:699f] (rev c7)
#0c:00.1Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DPAudio [Radeon RX 550 640SP / RX 560/560X] [1002:aae0]
#0a:00.0PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan RidgeDD 2018] [8086:15ef] (rev 06)
#20:00.0USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller[Titan Ridge DD 2018] [8086:15f0] (rev 06)
Outputs:
pvesh get /nodes/pve/hardware/pci --pci-class-blacklist "" => output in attachment
dmesg | grep 'remapping' =>
Code:
[ 0.166898] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.167778] DMAR-IR: Enabled IRQ remapping in x2apic mode
Code:
[ 0.012916] ACPI: DMAR 0x0000000030E64000 000088 (v02 INTEL EDK2 00000002 01000013)
[ 0.012937] ACPI: Reserving DMAR table memory at [mem 0x30e64000-0x30e64087]
[ 0.166879] DMAR: Host address width 39
[ 0.166880] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.166885] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 29a00f0505e
[ 0.166888] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.166891] DMAR: dmar1: reg_base_addr fed91000 ver 5:0 cap d2008c40660462 ecap f050da
[ 0.166893] DMAR: RMRR base: 0x0000003b000000 end: 0x0000003f7fffff
[ 0.166896] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[ 0.166897] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[ 0.166898] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.167778] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.474430] pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
[ 1.676788] DMAR: No ATSR found
[ 1.676789] DMAR: No SATC found
[ 1.676789] DMAR: IOMMU feature fl1gp_support inconsistent
[ 1.676790] DMAR: IOMMU feature pgsel_inv inconsistent
[ 1.676791] DMAR: IOMMU feature nwfs inconsistent
[ 1.676792] DMAR: IOMMU feature dit inconsistent
[ 1.676792] DMAR: IOMMU feature sc_support inconsistent
[ 1.676793] DMAR: IOMMU feature dev_iotlb_support inconsistent
[ 1.676794] DMAR: dmar0: Using Queued invalidation
[ 1.676796] DMAR: dmar1: Using Queued invalidation
[ 1.679016] DMAR: Intel(R) Virtualization Technology for Directed I/O
What bothers me in this output:
pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
I cannot find: iommu=on in the output, but according to this article :
https://vfio.blogspot.com/2016/09/intel-iommu-enabled-it-doesnt-mean-what.html
I should not look for that but the last line in my output that says: DMAR: Intel(R) Virtualization Technology for Directed I/OQuestion: Someone can confirm me my IOMMU is actually active?
cat /proc/modules | grep pci =>
Code:
snd_sof_pci_intel_tgl 12288 0 - Live 0xffffffffc1466000
snd_sof_intel_hda_common 208896 1 snd_sof_pci_intel_tgl, Live 0xffffffffc1a55000
snd_sof_pci 24576 2 snd_sof_pci_intel_tgl,snd_sof_intel_hda_common, Live 0xffffffffc1a76000
snd_sof 360448 3 snd_sof_intel_hda_common,snd_sof_intel_hda,snd_sof_pci, Live 0xffffffffc19f9000
snd_soc_acpi_intel_match 102400 2 snd_sof_pci_intel_tgl,snd_sof_intel_hda_common, Live 0xffffffffc128d000
vfio_pci 16384 1 - Live 0xffffffffc0d7d000
vfio_pci_core 86016 1 vfio_pci, Live 0xffffffffc0dfa000
irqbypass 12288 3 kvm,vfio_pci_core, Live 0xffffffffc0df2000
vfio 69632 7 vfio_pci,vfio_pci_core,vfio_iommu_type1, Live 0xffffffffc0dc0000
xhci_pci 24576 0 - Live 0xffffffffc02ec000
xhci_pci_renesas 16384 1 xhci_pci, Live 0xffffffffc0279000
xhci_hcd 364544 1 xhci_pci, Live 0xffffffffc041c000
intel_lpss_pci 24576 0 - Live 0xffffffffc04ed000
spi_intel_pci 12288 0 - Live 0xffffffffc03bd000
intel_lpss 12288 1 intel_lpss_pci, Live 0xffffffffc02e6000
spi_intel 32768 1 spi_intel_pci, Live 0xffffffffc02d7000
Some things I already tried:
1. Blacklisted also the Thunderbolt controller :
2. I tried a tip that said -> Update /etc/kernel/cmdline and add pcie_aspm=off to disable active state power management. => To no avail
3. Still on my todo list: But waiting for my order OnexGPU I am going to test first on my Minisforum MS-01 while still running baremetal W11 before converting also to a ProxMox server to test. Maybe it's a motherboard or bios problem.
4.5.6... Any thoughts, someone????