PVE 8.1.3 host on a Minisforum UM780 XTX (AMD Ryzen 7 7840HS w/ Radeon 780M Graphics) with a fairly standard iGPU passthrough setup. The guest is Manjaro/KDE w/ BIOS OVMF, Display none, Machine q35, hostpci 0000:c5:00.0,pcie=1 (+ rombar and all functions). If I enable Display SPICE and disable hostpci then the guest VM works fine, but when enabling the iGPU PCI Device I get this kernel crash which forces the host machine to reboot. I've seen lots of errors trying to get iGPU passthrough to work on 1/2 dozen machines now but I've never seen a total kernel crash like this before... any suggestions?
Code:
Jan 07 15:16:57 pve5 pvedaemon[2490]: start VM 102: UPID:pve5:000009BA:000027A2:659A33C9:qmstart:102:root@pam:
Jan 07 15:16:57 pve5 pvedaemon[1249]: <root@pam> starting task UPID:pve5:000009BA:000027A2:659A33C9:qmstart:102:root@pam:
Jan 07 15:16:57 pve5 kernel: xhci_hcd 0000:c5:00.3: remove, state 4
Jan 07 15:16:57 pve5 kernel: usb usb2: USB disconnect, device number 1
Jan 07 15:16:57 pve5 kernel: usb 2-1: USB disconnect, device number 2
Jan 07 15:16:57 pve5 kernel: usb 2-2: USB disconnect, device number 3
Jan 07 15:16:57 pve5 kernel: xhci_hcd 0000:c5:00.3: USB bus 2 deregistered
Jan 07 15:16:57 pve5 kernel: xhci_hcd 0000:c5:00.3: remove, state 1
Jan 07 15:16:57 pve5 kernel: usb usb1: USB disconnect, device number 1
Jan 07 15:16:57 pve5 kernel: usb 1-1: USB disconnect, device number 2
Jan 07 15:16:57 pve5 kernel: usb 1-1.1: USB disconnect, device number 4
Jan 07 15:16:57 pve5 kernel: usb 1-2: USB disconnect, device number 3
Jan 07 15:16:57 pve5 kernel: usb 1-5: USB disconnect, device number 5
Jan 07 15:16:57 pve5 kernel: xhci_hcd 0000:c5:00.3: USB bus 1 deregistered
Jan 07 15:16:57 pve5 systemd[1]: Starting systemd-rfkill.service - Load/Save RF Kill Switch Status...
Jan 07 15:16:57 pve5 systemd[1]: Stopped target bluetooth.target - Bluetooth Support.
Jan 07 15:16:57 pve5 systemd[1]: Started systemd-rfkill.service - Load/Save RF Kill Switch Status.
Jan 07 15:16:58 pve5 kernel: xhci_hcd 0000:c5:00.4: remove, state 4
Jan 07 15:16:58 pve5 kernel: usb usb4: USB disconnect, device number 1
Jan 07 15:16:58 pve5 kernel: xhci_hcd 0000:c5:00.4: USB bus 4 deregistered
Jan 07 15:16:58 pve5 kernel: xhci_hcd 0000:c5:00.4: remove, state 4
Jan 07 15:16:58 pve5 kernel: usb usb3: USB disconnect, device number 1
Jan 07 15:16:58 pve5 kernel: xhci_hcd 0000:c5:00.4: USB bus 3 deregistered
Jan 07 15:16:58 pve5 systemd[1]: Stopped target sound.target - Sound Card.
Jan 07 15:16:58 pve5 kernel: ------------[ cut here ]------------
Jan 07 15:16:58 pve5 kernel: remove_proc_entry: removing non-empty directory 'irq/111', leaking at least 'ACP_PCI_IRQ'
Jan 07 15:16:58 pve5 kernel: WARNING: CPU: 13 PID: 2490 at fs/proc/generic.c:717 remove_proc_entry+0x1b4/0x1e0
Jan 07 15:16:58 pve5 kernel: Modules linked in: ceph libceph fscache netfs ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables nvme_fabrics softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink snd_sof_amd_rembrandt snd_sof_amd_renoir intel_rapl_msr snd_sof_amd_acp intel_rapl_common snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_hda_codec_realtek snd_soc_core snd_hda_codec_generic edac_mce_amd snd_compress ledtrig_audio kvm_amd ac97_bus snd_hda_intel snd_pcm_dmaengine kvm snd_intel_dspcfg snd_pci_ps btusb crct10dif_pclmul snd_intel_sdw_acpi snd_rpl_pci_acp6x btrtl polyval_clmulni iwlmvm snd_hda_codec snd_acp_pci btbcm polyval_generic snd_hda_core snd_pci_acp6x mac80211 btintel ghash_clmulni_intel snd_hwdep snd_pci_acp5x libarc4 btmtk vhost_net aesni_intel snd_pcm snd_rn_pci_acp3x vhost bluetooth iwlwifi crypto_simd snd_timer snd_acp_config vhost_iotlb cryptd ecdh_generic cfg80211 snd snd_soc_acpi tap input_leds rapl
Jan 07 15:16:58 pve5 kernel: pcspkr ecc k10temp ccp snd_pci_acp3x soundcore amd_pmc serio_raw mac_hid vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 hid_generic usbkbd usbhid zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c simplefb xhci_pci nvme xhci_pci_renesas i2c_hid_acpi r8169 nvme_core video thunderbolt xhci_hcd i2c_hid crc32_pclmul i2c_piix4 realtek nvme_common wmi hid
Jan 07 15:16:58 pve5 kernel: CPU: 13 PID: 2490 Comm: task UPID:pve5: Tainted: P O 6.5.11-7-pve #1
Jan 07 15:16:58 pve5 kernel: Hardware name: Micro Computer (HK) Tech Limited Venus series/F7BSD, BIOS 1.04 11/15/2023
Jan 07 15:16:58 pve5 kernel: RIP: 0010:remove_proc_entry+0x1b4/0x1e0
Jan 07 15:16:58 pve5 kernel: Code: 90 78 ff ff ff 48 0f 45 c2 49 8b 57 f0 48 89 f1 48 c7 c6 40 1b 65 a7 48 8b 92 a0 00 00 00 4c 8b 80 a0 00 00 00 e8 0c 0d bc ff <0f> 0b e9 64 ff ff ff 49 8b 77 18 48 c7 c7 18 5c b7 a7 e8 f5 0c bc
Jan 07 15:16:58 pve5 kernel: RSP: 0018:ffffb67499dbba68 EFLAGS: 00010246
Jan 07 15:16:58 pve5 kernel: RAX: 0000000000000000 RBX: ffff9e07c6a12a80 RCX: 0000000000000000
Jan 07 15:16:58 pve5 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jan 07 15:16:58 pve5 kernel: RBP: ffffb67499dbbab0 R08: 0000000000000000 R09: 0000000000000000
Jan 07 15:16:58 pve5 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9e07c6a12b00
Jan 07 15:16:58 pve5 kernel: R13: ffffb67499dbbac6 R14: ffffb67499dbbac6 R15: ffff9e07c6a12b08
Jan 07 15:16:58 pve5 kernel: FS: 00007f0547c11b80(0000) GS:ffff9e1e40340000(0000) knlGS:0000000000000000
Jan 07 15:16:58 pve5 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 07 15:16:58 pve5 kernel: CR2: 0000562f4b41a6f0 CR3: 0000000179e16000 CR4: 0000000000750ee0
Jan 07 15:16:58 pve5 kernel: PKRU: 55555554
Jan 07 15:16:58 pve5 kernel: Call Trace:
Jan 07 15:16:58 pve5 kernel: <TASK>
Jan 07 15:16:58 pve5 kernel: ? show_regs+0x6d/0x80
Jan 07 15:16:58 pve5 kernel: ? __warn+0x89/0x160
Jan 07 15:16:58 pve5 kernel: ? remove_proc_entry+0x1b4/0x1e0
Jan 07 15:16:58 pve5 kernel: ? report_bug+0x17e/0x1b0
Jan 07 15:16:58 pve5 kernel: ? handle_bug+0x46/0x90
Jan 07 15:16:58 pve5 kernel: ? exc_invalid_op+0x18/0x80
Jan 07 15:16:58 pve5 kernel: ? asm_exc_invalid_op+0x1b/0x20
Jan 07 15:16:58 pve5 kernel: ? remove_proc_entry+0x1b4/0x1e0
Jan 07 15:16:58 pve5 kernel: ? remove_proc_entry+0x1b4/0x1e0
Jan 07 15:16:58 pve5 kernel: unregister_irq_proc+0xf2/0x120
Jan 07 15:16:58 pve5 kernel: free_desc+0x41/0xe0
Jan 07 15:16:58 pve5 kernel: ? srso_alias_return_thunk+0x5/0x7f
Jan 07 15:16:58 pve5 kernel: ? __kmem_cache_free+0x306/0x350
Jan 07 15:16:58 pve5 kernel: irq_free_descs+0x52/0x80
Jan 07 15:16:58 pve5 kernel: irq_domain_free_irqs+0x150/0x1c0
Jan 07 15:16:58 pve5 kernel: mp_unmap_irq+0x8e/0x90
Jan 07 15:16:58 pve5 kernel: acpi_unregister_gsi_ioapic+0x2e/0x50
Jan 07 15:16:58 pve5 kernel: acpi_unregister_gsi+0x17/0x30
Jan 07 15:16:58 pve5 kernel: acpi_pci_irq_disable+0x7b/0xd0
Jan 07 15:16:58 pve5 kernel: pcibios_disable_device+0x20/0x40
Jan 07 15:16:58 pve5 kernel: do_pci_disable_device+0x45/0x90
Jan 07 15:16:58 pve5 kernel: pci_disable_device+0xd3/0xf0
Jan 07 15:16:58 pve5 kernel: snd_acp63_remove+0x95/0xd0 [snd_pci_ps]
Jan 07 15:16:58 pve5 kernel: pci_device_remove+0x36/0xb0
Jan 07 15:16:58 pve5 kernel: device_remove+0x40/0x80
Jan 07 15:16:58 pve5 kernel: device_release_driver_internal+0x20b/0x270
Jan 07 15:16:58 pve5 kernel: ? bus_find_device+0xb8/0xf0
Jan 07 15:16:58 pve5 kernel: device_driver_detach+0x14/0x20
Jan 07 15:16:58 pve5 kernel: unbind_store+0xac/0xc0
Jan 07 15:16:58 pve5 kernel: drv_attr_store+0x21/0x50
Jan 07 15:16:58 pve5 kernel: sysfs_kf_write+0x3b/0x60
Jan 07 15:16:58 pve5 kernel: kernfs_fop_write_iter+0x130/0x210
Jan 07 15:16:58 pve5 kernel: vfs_write+0x251/0x440
Jan 07 15:16:58 pve5 kernel: ksys_write+0x73/0x100
Jan 07 15:16:58 pve5 kernel: __x64_sys_write+0x19/0x30
Jan 07 15:16:58 pve5 kernel: do_syscall_64+0x58/0x90
Jan 07 15:16:58 pve5 kernel: ? srso_alias_return_thunk+0x5/0x7f
Jan 07 15:16:58 pve5 kernel: ? do_syscall_64+0x67/0x90
Jan 07 15:16:58 pve5 kernel: ? syscall_exit_to_user_mode+0x37/0x60
Jan 07 15:16:58 pve5 kernel: ? srso_alias_return_thunk+0x5/0x7f
Jan 07 15:16:58 pve5 kernel: ? do_syscall_64+0x67/0x90
Jan 07 15:16:58 pve5 kernel: ? srso_alias_return_thunk+0x5/0x7f
Jan 07 15:16:58 pve5 kernel: ? do_syscall_64+0x67/0x90
Jan 07 15:16:58 pve5 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Jan 07 15:16:58 pve5 kernel: RIP: 0033:0x7f0547d47140
Jan 07 15:16:58 pve5 kernel: Code: 40 00 48 8b 15 c1 9c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 24 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
Jan 07 15:16:58 pve5 kernel: RSP: 002b:00007fff3df76928 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Jan 07 15:16:58 pve5 kernel: RAX: ffffffffffffffda RBX: 0000562f4ae792a0 RCX: 00007f0547d47140
Jan 07 15:16:58 pve5 kernel: RDX: 000000000000000c RSI: 0000562f52de0730 RDI: 000000000000000d
Jan 07 15:16:58 pve5 kernel: RBP: 0000562f52de0730 R08: 0000000000000000 R09: 0000562f52dbb7d0
Jan 07 15:16:58 pve5 kernel: R10: 0000562f4f98e400 R11: 0000000000000202 R12: 000000000000000c
Jan 07 15:16:58 pve5 kernel: R13: 0000562f4ae792a0 R14: 000000000000000d R15: 0000562f52dddd40
Jan 07 15:16:58 pve5 kernel: </TASK>
Jan 07 15:16:58 pve5 kernel: ---[ end trace 0000000000000000 ]---
Jan 07 15:16:58 pve5 systemd[1]: Created slice qemu.slice - Slice /qemu.
Jan 07 15:16:58 pve5 systemd[1]: Started 102.scope.
Jan 07 15:16:58 pve5 kernel: tap102i0: entered promiscuous mode
Jan 07 15:16:58 pve5 kernel: vmbr0: port 2(tap102i0) entered blocking state
Jan 07 15:16:58 pve5 kernel: vmbr0: port 2(tap102i0) entered disabled state
Jan 07 15:16:58 pve5 kernel: tap102i0: entered allmulticast mode
Jan 07 15:16:58 pve5 kernel: vmbr0: port 2(tap102i0) entered blocking state
Jan 07 15:16:58 pve5 kernel: vmbr0: port 2(tap102i0) entered forwarding state
Jan 07 15:16:59 pve5 kernel: vfio-pci 0000:c5:00.0: enabling device (0002 -> 0003)
Jan 07 15:16:59 pve5 kernel: vfio-pci 0000:c5:00.1: enabling device (0000 -> 0002)
Jan 07 15:17:00 pve5 pvedaemon[1249]: <root@pam> end task UPID:pve5:000009BA:000027A2:659A33C9:qmstart:102:root@pam: OK
client_loop: send disconnect: Broken pipe