I would like to pass through my iGPU of the 5650GE. Without the ACS override my IOMMU groups are messed up so I used the patch and it looks pretty good now. When I want to start a VM which has the GPU attached my PVE host crashes. Since there're so many information floating around the net and I tried a lot of tips i don't know whats the best option to make some progess. Maybe you could help me by telling me which information u need all in all. Thanks!
PVE kernel version
IOMMU groups
GRUB_CMDLINE_LINUX_DEFAULT
I read that for AMD IOMMU will be on anyway so I don't need that option.
/etc/modprobe.d/pve-blacklist.conf
/etc/modprobe.d/vfio.conf
I read I don't need those anymore so I commented them out. Is that right?
Host VGA adapter
Seems like no kernel module driver is in use. Without comenting out the vfio config, that driver is in use.
When I start a VM with the graphics attached, dmesg looks like that before it freezes. Sometimes it does not freeze but the system is in some kind of unreliable state and needs a reboot to function well.
So what can I do to support debugging? Thanks!
PVE kernel version
Code:
root@pve:~# uname -a
Linux pve 5.15.60-1-pve #1 SMP PVE 5.15.60-1 (Mon, 19 Sep 2022 17:53:17 +0200) x86_64 GNU/Linux
IOMMU groups
Code:
root@pve:~# ./script.sh
IOMMU Group 0:
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU Group 1:
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU Group 10:
02:00.1 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 1a)
IOMMU Group 11:
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [1002:1638] (rev db)
IOMMU Group 12:
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
IOMMU Group 13:
03:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
IOMMU Group 14:
03:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
IOMMU Group 15:
03:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
IOMMU Group 16:
03:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor [1022:15e2] (rev 01)
IOMMU Group 2:
00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU Group 3:
00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU Group 4:
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU Group 5:
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
IOMMU Group 6:
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 51)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 7:
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166a]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166b]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166c]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166d]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166e]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:166f]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1670]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1671]
IOMMU Group 8:
01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Black SN750 / PC SN730 NVMe SSD [15b7:5006]
IOMMU Group 9:
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device [10ec:816e] (rev 1a)
GRUB_CMDLINE_LINUX_DEFAULT
I read that for AMD IOMMU will be on anyway so I don't need that option.
Code:
GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet pcie_acs_override=downstream,multifunction iommu=pt video=vesafb:off video=efifb:off video=simplefb:off"
/etc/modprobe.d/pve-blacklist.conf
Code:
/etc/modprobe.d/pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE
# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
blacklist amdgpu
/etc/modprobe.d/vfio.conf
I read I don't need those anymore so I commented them out. Is that right?
Code:
#options vfio-pci ids=1002:1638,1002:1637 disable_vga=1
Host VGA adapter
Seems like no kernel module driver is in use. Without comenting out the vfio config, that driver is in use.
Code:
root@pve:~# lspci -nnk | grep -i VGA -A2
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [1002:1638] (rev db)
Subsystem: Lenovo Device [17aa:32e4]
Kernel modules: amdgpu
When I start a VM with the graphics attached, dmesg looks like that before it freezes. Sometimes it does not freeze but the system is in some kind of unreliable state and needs a reboot to function well.
Code:
[ 1079.051614] vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[ 1079.071857] vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[ 1079.095706] vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[ 1079.316202] xhci_hcd 0000:03:00.3: remove, state 4
[ 1079.316210] usb usb2: USB disconnect, device number 1
[ 1079.316378] xhci_hcd 0000:03:00.3: USB bus 2 deregistered
[ 1079.316383] xhci_hcd 0000:03:00.3: remove, state 1
[ 1079.316386] usb usb1: USB disconnect, device number 1
[ 1079.316387] usb 1-1: USB disconnect, device number 2
[ 1079.330514] usb 1-2: USB disconnect, device number 3
[ 1079.634870] xhci_hcd 0000:03:00.3: USB bus 1 deregistered
[ 1079.748066] xhci_hcd 0000:03:00.4: remove, state 4
[ 1079.748077] usb usb4: USB disconnect, device number 1
[ 1079.748080] usb 4-1: USB disconnect, device number 2
[ 1079.763506] xhci_hcd 0000:03:00.4: USB bus 4 deregistered
[ 1079.763518] xhci_hcd 0000:03:00.4: remove, state 4
[ 1079.763522] usb usb3: USB disconnect, device number 1
[ 1079.763524] usb 3-1: USB disconnect, device number 2
[ 1079.779814] xhci_hcd 0000:03:00.4: USB bus 3 deregistered
[ 1079.807426] vfio-pci 0000:03:00.4: refused to change power state from D0 to D3hot
[ 1080.519560] device tap101i0 entered promiscuous mode
[ 1080.543210] vmbr0: port 2(fwpr101p0) entered blocking state
[ 1080.543214] vmbr0: port 2(fwpr101p0) entered disabled state
[ 1080.543295] device fwpr101p0 entered promiscuous mode
[ 1080.543350] vmbr0: port 2(fwpr101p0) entered blocking state
[ 1080.543352] vmbr0: port 2(fwpr101p0) entered forwarding state
[ 1080.548049] fwbr101i0: port 1(fwln101i0) entered blocking state
[ 1080.548052] fwbr101i0: port 1(fwln101i0) entered disabled state
[ 1080.548103] device fwln101i0 entered promiscuous mode
[ 1080.548136] fwbr101i0: port 1(fwln101i0) entered blocking state
[ 1080.548138] fwbr101i0: port 1(fwln101i0) entered forwarding state
[ 1080.552218] fwbr101i0: port 2(tap101i0) entered blocking state
[ 1080.552220] fwbr101i0: port 2(tap101i0) entered disabled state
[ 1080.552261] fwbr101i0: port 2(tap101i0) entered blocking state
[ 1080.552262] fwbr101i0: port 2(tap101i0) entered forwarding state
[ 1081.993212] vfio-pci 0000:03:00.0: enabling device (0002 -> 0003)
[ 1081.993425] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[ 1081.993430] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[ 1081.993432] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x25@0x400
[ 1081.993433] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x26@0x410
[ 1081.993434] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x27@0x440
[ 1082.056825] vfio-pci 0000:03:00.3: enabling device (0000 -> 0002)
[ 1082.112561] vfio-pci 0000:03:00.4: enabling device (0000 -> 0002)
[ 1083.787148] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1083.803167] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1083.835150] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1083.867188] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1083.883145] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1083.899144] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.261625] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.261847] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.262068] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.262288] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.262507] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.262725] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.296692] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.296908] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.297125] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.297342] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.297768] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.297991] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.325181] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.325398] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.325603] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.325824] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.326041] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.326268] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.364414] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.365669] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.365686] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.365707] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.365927] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.365942] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.365962] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.366257] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.366291] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.366311] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.366515] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.366529] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.366549] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.366754] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.366768] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.366788] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.366993] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.367008] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.367210] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.367226] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.367240] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.367255] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.367269] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.367284] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.407099] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.407407] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.407705] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.407997] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.408292] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.408583] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.430819] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.431224] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.431243] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.431603] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.431624] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.433497] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.433519] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.433892] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.433910] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.434259] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.434276] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.434621] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.463470] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.463830] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.464182] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.464533] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.464877] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.465223] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.608562] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.609004] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.609435] vfio-pci 0000:03:00.2: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.609870] vfio-pci 0000:03:00.3: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.610321] vfio-pci 0000:03:00.4: vfio_bar_restore: reset recovery - restoring BARs
[ 1086.610758] vfio-pci 0000:03:00.5: vfio_bar_restore: reset recovery - restoring BARs
So what can I do to support debugging? Thanks!
Last edited: