Guten Tag,
da ich hier neu bin, erst einmal eine ganz kurze Vorstellung:
Ich betreibe seit einigen Jahren einen Server auf Citrix Xen Basis, allerdings wurden die Preise nun stark erhöht und Lizenzen auch nicht mehr verlängert, sodass ich nun auf Proxmox umgestiegen bin. Auf XEN hatte ich diese Kombination bereits lauffähig, da ich die GPU zur Berechnung von Simulationen benötige.
Ich bin für jegliche Hilfe dankbar.
MfG Ludwig
Problem:
GPU Passthrough nicht möglich, Fehlercode IOTLB_INV_TIMEOUT
Liegt dies am bekannten AMD RESET BUG, wenn ja, gibt es bereits einen Fix dafür?
Ehrlich gesagt bin ich nun auch überfragt, was ich als nächstes probieren soll
Hardware:
Ryzen 1600X
ASROCK X470D4U
PCIe_Slot_6: LSI Logic / Symbios Logic SAS2008
PCIe_Slot_5: Intel i350T4
PCIe_Slot_4: GPU AMD RX570
Software:
PVE Kernel Version: 5.4.65-1
VM_102: Win10_2004
da ich hier neu bin, erst einmal eine ganz kurze Vorstellung:
Ich betreibe seit einigen Jahren einen Server auf Citrix Xen Basis, allerdings wurden die Preise nun stark erhöht und Lizenzen auch nicht mehr verlängert, sodass ich nun auf Proxmox umgestiegen bin. Auf XEN hatte ich diese Kombination bereits lauffähig, da ich die GPU zur Berechnung von Simulationen benötige.
Ich bin für jegliche Hilfe dankbar.
MfG Ludwig
Problem:
GPU Passthrough nicht möglich, Fehlercode IOTLB_INV_TIMEOUT
Liegt dies am bekannten AMD RESET BUG, wenn ja, gibt es bereits einen Fix dafür?
Ehrlich gesagt bin ich nun auch überfragt, was ich als nächstes probieren soll
Hardware:
Ryzen 1600X
ASROCK X470D4U
PCIe_Slot_6: LSI Logic / Symbios Logic SAS2008
PCIe_Slot_5: Intel i350T4
PCIe_Slot_4: GPU AMD RX570
Software:
PVE Kernel Version: 5.4.65-1
VM_102: Win10_2004
Code:
### VENDOR ID AMD RX570 ###
1002:67df
1002:aaf0
### /etc/default/grub ###
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
tried, but not working: iommu=soft
### /etc/modprobe.d/vfio.conf ###
options vfio-pci ids=1002:67df,1002:aaf0 disable_vga=1 disable_idle_d3=1
### LOG ###
Oct 17 14:15:30 pve pvedaemon[5290]: start VM 102: UPID:pve:000014AA:0000402F:5F8AE062:qmstart:102:root@pam:
Oct 17 14:15:30 pve pvedaemon[2972]: <root@pam> starting task UPID:pve:000014AA:0000402F:5F8AE062:qmstart:102:root@pam:
Oct 17 14:15:30 pve systemd[1]: Created slice qemu.slice.
Oct 17 14:15:30 pve systemd[1]: Started 102.scope.
Oct 17 14:15:31 pve kernel: vfio-pci 0000:2c:00.0: enabling device (0000 -> 0003)
Oct 17 14:15:31 pve kernel: vfio-pci 0000:2c:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
Oct 17 14:15:31 pve kernel: vfio-pci 0000:2c:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
Oct 17 14:15:31 pve kernel: vfio-pci 0000:2c:00.0: vfio_ecap_init: hiding ecap 0x1e@0x370
Oct 17 14:15:31 pve kernel: vfio-pci 0000:2c:00.1: enabling device (0000 -> 0002)
Oct 17 14:15:32 pve kernel: vfio-pci 0000:2c:00.1: vfio_bar_restore: reset recovery - restoring BARs
Oct 17 14:15:32 pve kernel: vfio-pci 0000:2c:00.0: vfio_bar_restore: reset recovery - restoring BARs
Oct 17 14:15:32 pve kernel: pcieport 0000:00:03.2: AER: Uncorrected (Non-Fatal) error received: 0000:00:00.0
Oct 17 14:15:32 pve kernel: pcieport 0000:00:03.2: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
Oct 17 14:15:32 pve kernel: pcieport 0000:00:03.2: AER: device [1022:1453] error status/mask=00200000/04400000
Oct 17 14:15:32 pve kernel: pcieport 0000:00:03.2: AER: [21] ACSViol (First)
Oct 17 14:15:32 pve kernel: pcieport 0000:00:03.2: AER: Device recovery successful
Oct 17 14:15:33 pve kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 14:15:33 pve kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 14:15:33 pve kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 14:15:33 pve kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 14:15:33 pve QEMU[5307]: kvm: vfio_err_notifier_handler(0000:2c:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
Oct 17 14:15:33 pve QEMU[5307]: kvm: vfio_err_notifier_handler(0000:2c:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
Oct 17 14:15:33 pve pvedaemon[2972]: <root@pam> end task UPID:pve:000014AA:0000402F:5F8AE062:qmstart:102:root@pam: OK
Oct 17 14:15:33 pve kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 14:15:33 pve kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 14:15:33 pve kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 14:15:33 pve kernel: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=2c:00.0 address=0x7fb59ec90] <---
### LSPCI ###
root@pve:~# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
01:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
01:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
03:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43d0 (rev 01)
03:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller (rev 01)
03:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge (rev 01)
20:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01)
20:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01)
20:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01)
20:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01)
20:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01)
20:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01)
21:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
22:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
23:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
24:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
25:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
2b:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03) <---
2c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev ef) <---
2c:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]
2d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function
2d:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor
2d:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller
2e:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function
2e:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
2e:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller
### IOMMU GROUPS ###
root@pve:~# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/17/devices/0000:01:00.3
/sys/kernel/iommu_groups/7/devices/0000:00:04.0
/sys/kernel/iommu_groups/25/devices/0000:2e:00.2
/sys/kernel/iommu_groups/15/devices/0000:01:00.1
/sys/kernel/iommu_groups/5/devices/0000:00:03.1
/sys/kernel/iommu_groups/23/devices/0000:2d:00.3
/sys/kernel/iommu_groups/13/devices/0000:00:18.3
/sys/kernel/iommu_groups/13/devices/0000:00:18.1
/sys/kernel/iommu_groups/13/devices/0000:00:18.6
/sys/kernel/iommu_groups/13/devices/0000:00:18.4
/sys/kernel/iommu_groups/13/devices/0000:00:18.2
/sys/kernel/iommu_groups/13/devices/0000:00:18.0
/sys/kernel/iommu_groups/13/devices/0000:00:18.7
/sys/kernel/iommu_groups/13/devices/0000:00:18.5
/sys/kernel/iommu_groups/3/devices/0000:00:02.0
/sys/kernel/iommu_groups/21/devices/0000:2d:00.0
/sys/kernel/iommu_groups/11/devices/0000:00:08.1
/sys/kernel/iommu_groups/1/devices/0000:00:01.1
/sys/kernel/iommu_groups/18/devices/0000:03:00.0
/sys/kernel/iommu_groups/18/devices/0000:20:00.0
/sys/kernel/iommu_groups/18/devices/0000:20:03.0
/sys/kernel/iommu_groups/18/devices/0000:25:00.0
/sys/kernel/iommu_groups/18/devices/0000:24:00.0
/sys/kernel/iommu_groups/18/devices/0000:20:02.0
/sys/kernel/iommu_groups/18/devices/0000:03:00.1
/sys/kernel/iommu_groups/18/devices/0000:23:00.0
/sys/kernel/iommu_groups/18/devices/0000:20:08.0
/sys/kernel/iommu_groups/18/devices/0000:22:00.0
/sys/kernel/iommu_groups/18/devices/0000:20:01.0
/sys/kernel/iommu_groups/18/devices/0000:20:04.0
/sys/kernel/iommu_groups/18/devices/0000:21:00.0
/sys/kernel/iommu_groups/18/devices/0000:03:00.2
/sys/kernel/iommu_groups/8/devices/0000:00:07.0
/sys/kernel/iommu_groups/26/devices/0000:2e:00.3
/sys/kernel/iommu_groups/16/devices/0000:01:00.2
/sys/kernel/iommu_groups/6/devices/0000:00:03.2
/sys/kernel/iommu_groups/24/devices/0000:2e:00.0
/sys/kernel/iommu_groups/14/devices/0000:01:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:03.0
/sys/kernel/iommu_groups/22/devices/0000:2d:00.2
/sys/kernel/iommu_groups/12/devices/0000:00:14.3
/sys/kernel/iommu_groups/12/devices/0000:00:14.0
/sys/kernel/iommu_groups/2/devices/0000:00:01.3
/sys/kernel/iommu_groups/20/devices/0000:2c:00.1 <---
/sys/kernel/iommu_groups/20/devices/0000:2c:00.0 <---
/sys/kernel/iommu_groups/10/devices/0000:00:08.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.0
/sys/kernel/iommu_groups/19/devices/0000:2b:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:07.1
Attachments
Last edited: