Device: Beelink EQ59 N5105 (two NICs)
Proxmox version: 7.3-4
I install the latest kernel 6.1 on Proxmox and passthroughed a NIC as PCIe device to OpenWrt virtual machine. OpenWrt VM worked for about 6 days. It freezed suddenly yesterday. Proxmox web interface showed "internal-error" on the VM. I accessed it in console but can not input anything. I stoped the VM and started it. It failed to start.
Yesterday, I upgraded Proxmox, removed all 6.1 and 5.19 kernels, rebooted Proxmox. Proxmox was running 5.15 kernel. The VM also ran correctly. Today, the VM freezed again and Proxmox showed "internal-error". I stopped and started the VM. It failed again.
Error message:
VM config(excerpt):
grub config(excerpt):
/etc/modules :
Proxmox version: 7.3-4
Code:
root@pve ~# lspci
00:00.0 Host bridge: Intel Corporation Device 4e24
00:02.0 VGA compatible controller: Intel Corporation Device 4e61 (rev 01)
00:04.0 Signal processing controller: Intel Corporation Device 4e03
00:14.0 USB controller: Intel Corporation Device 4ded (rev 01)
00:14.2 RAM memory: Intel Corporation Device 4def (rev 01)
00:15.0 Serial bus controller [0c80]: Intel Corporation Device 4de8 (rev 01)
00:15.1 Serial bus controller [0c80]: Intel Corporation Device 4de9 (rev 01)
00:15.2 Serial bus controller [0c80]: Intel Corporation Device 4dea (rev 01)
00:15.3 Serial bus controller [0c80]: Intel Corporation Device 4deb (rev 01)
00:16.0 Communication controller: Intel Corporation Device 4de0 (rev 01)
00:17.0 SATA controller: Intel Corporation Device 4dd3 (rev 01)
00:19.0 Serial bus controller [0c80]: Intel Corporation Device 4dc5 (rev 01)
00:19.1 Serial bus controller [0c80]: Intel Corporation Device 4dc6 (rev 01)
00:1c.0 PCI bridge: Intel Corporation Device 4dbc (rev 01)
00:1c.5 PCI bridge: Intel Corporation Device 4dbd (rev 01)
00:1c.6 PCI bridge: Intel Corporation Device 4dbe (rev 01)
00:1e.0 Communication controller: Intel Corporation Device 4da8 (rev 01)
00:1e.3 Serial bus controller [0c80]: Intel Corporation Device 4dab (rev 01)
00:1f.0 ISA bridge: Intel Corporation Device 4d87 (rev 01)
00:1f.3 Audio device: Intel Corporation Device 4dc8 (rev 01)
00:1f.4 SMBus: Intel Corporation Device 4da3 (rev 01)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Device 4da4 (rev 01)
01:00.0 Network controller: Intel Corporation Wireless 3165 (rev 81)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev ff)
I install the latest kernel 6.1 on Proxmox and passthroughed a NIC as PCIe device to OpenWrt virtual machine. OpenWrt VM worked for about 6 days. It freezed suddenly yesterday. Proxmox web interface showed "internal-error" on the VM. I accessed it in console but can not input anything. I stoped the VM and started it. It failed to start.
Yesterday, I upgraded Proxmox, removed all 6.1 and 5.19 kernels, rebooted Proxmox. Proxmox was running 5.15 kernel. The VM also ran correctly. Today, the VM freezed again and Proxmox showed "internal-error". I stopped and started the VM. It failed again.
Error message:
Code:
kvm: ../hw/pci/pci.c:1562: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
TASK ERROR: start failed: QEMU exited with code 1
VM config(excerpt):
Code:
bios: ovmf
boot: order=scsi0
cores: 3
cpu: host
efidisk0: local-lvm:vm-101-disk-2,efitype=4m,size=4M
hostpci0: 0000:03:00,pcie=1
machine: q35
memory: 2048
meta: creation-qemu=6.2.0,ctime=1668389097
name: OpenWrt
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-101-disk-0,size=6164M
scsihw: virtio-scsi-pci
smbios1: uuid=0f9be60c-d319-4afb-a0b5-f9c3621b9b3f
sockets: 1
startup: order=1
unused0: local-lvm:vm-101-disk-1
grub config(excerpt):
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
/etc/modules :
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
duplicate line are removed
Code:
-- Journal begins at Sun 2022-11-13 23:39:21 CST, ends at Tue 2022-12-27 15:32:25 CST. --
Dec 27 15:32:11 pve kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
# the line above are repeated 927 times
Dec 27 15:32:11 pve kernel: vfio-pci 0000:03:00.0: can't change power state from D3cold to D0 (config space inaccessible)
Dec 27 15:32:11 pve kernel: pcieport 0000:00:1c.6: DPC: Data Link Layer Link Active not set in 1000 msec
Dec 27 15:32:11 pve kernel: pcieport 0000:00:1c.6: AER: subordinate device reset failed
Dec 27 15:32:11 pve kernel: pcieport 0000:00:1c.6: AER: device recovery failed
Dec 27 15:32:11 pve kernel: pcieport 0000:00:1c.6: DPC: containment event, status:0x1f01 source:0x0000
Dec 27 15:32:11 pve kernel: pcieport 0000:00:1c.6: DPC: unmasked uncorrectable error detected
Dec 27 15:32:11 pve kernel: pcieport 0000:00:1c.6: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Dec 27 15:32:11 pve kernel: pcieport 0000:00:1c.6: device [8086:4dbe] error status/mask=00100000/00000000
Dec 27 15:32:11 pve kernel: pcieport 0000:00:1c.6: [20] UnsupReq (First)
Dec 27 15:32:11 pve kernel: pcieport 0000:00:1c.6: AER: TLP Header: 34000000 03000010 00000000 90039003
Dec 27 15:32:12 pve pvestatd[1222]: storage 'hi' is not online
Dec 27 15:32:12 pve kernel: vfio-pci 0000:03:00.0: invalid power transition (from D3cold to D3hot)
Dec 27 15:32:12 pve pvedaemon[259909]: VM 101 qmp command failed - VM 101 not running
Dec 27 15:32:12 pve systemd[1]: 101.scope: Succeeded.
░░ Subject: Unit succeeded
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit 101.scope has successfully entered the 'dead' state.
Dec 27 15:32:12 pve pvedaemon[269302]: start failed: QEMU exited with code 1
Dec 27 15:32:12 pve pvedaemon[265545]: <root@pam> end task UPID:pve:00041BF6:0068A1CD:63AA9F78:qmstart:101:root@pam: start failed: QEMU exited with code 1
Dec 27 15:32:14 pve kernel: pcieport 0000:00:1c.6: DPC: Data Link Layer Link Active not set in 1000 msec
Dec 27 15:32:14 pve kernel: pcieport 0000:00:1c.6: AER: subordinate device reset failed
Dec 27 15:32:14 pve kernel: pcieport 0000:00:1c.6: AER: device recovery failed
Dec 27 15:32:14 pve kernel: pcieport 0000:00:1c.6: DPC: containment event, status:0x1f01 source:0x0000
Dec 27 15:32:14 pve kernel: pcieport 0000:00:1c.6: DPC: unmasked uncorrectable error detected
Dec 27 15:32:14 pve kernel: pcieport 0000:00:1c.6: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Dec 27 15:32:14 pve kernel: pcieport 0000:00:1c.6: device [8086:4dbe] error status/mask=00100000/00000000
Dec 27 15:32:14 pve kernel: pcieport 0000:00:1c.6: [20] UnsupReq (First)
Dec 27 15:32:14 pve kernel: pcieport 0000:00:1c.6: AER: TLP Header: 34000000 03000010 00000000 90039003
Dec 27 15:32:15 pve pvestatd[1222]: storage 'we' is not online
Dec 27 15:32:16 pve kernel: pcieport 0000:00:1c.6: DPC: Data Link Layer Link Active not set in 1000 msec
Dec 27 15:32:16 pve kernel: pcieport 0000:00:1c.6: AER: subordinate device reset failed
Dec 27 15:32:16 pve kernel: pcieport 0000:00:1c.6: AER: device recovery failed
Dec 27 15:32:16 pve kernel: pcieport 0000:00:1c.6: DPC: containment event, status:0x1f01 source:0x0000
Dec 27 15:32:16 pve kernel: pcieport 0000:00:1c.6: DPC: unmasked uncorrectable error detected
Dec 27 15:32:16 pve kernel: pcieport 0000:00:1c.6: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Dec 27 15:32:16 pve kernel: pcieport 0000:00:1c.6: device [8086:4dbe] error status/mask=00100000/00000000
Dec 27 15:32:16 pve kernel: pcieport 0000:00:1c.6: [20] UnsupReq (First)
Dec 27 15:32:16 pve kernel: pcieport 0000:00:1c.6: AER: TLP Header: 34000000 03000010 00000000 90039003
Dec 27 15:32:19 pve kernel: pcieport 0000:00:1c.6: DPC: Data Link Layer Link Active not set in 1000 msec
Dec 27 15:32:19 pve kernel: pcieport 0000:00:1c.6: AER: subordinate device reset failed
Dec 27 15:32:19 pve kernel: pcieport 0000:00:1c.6: AER: device recovery failed
Dec 27 15:32:19 pve kernel: pcieport 0000:00:1c.6: DPC: containment event, status:0x1f01 source:0x0000
Dec 27 15:32:19 pve kernel: pcieport 0000:00:1c.6: DPC: unmasked uncorrectable error detected
Dec 27 15:32:19 pve kernel: pcieport 0000:00:1c.6: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Dec 27 15:32:19 pve kernel: pcieport 0000:00:1c.6: device [8086:4dbe] error status/mask=00100000/00000000
Dec 27 15:32:19 pve kernel: pcieport 0000:00:1c.6: [20] UnsupReq (First)
Dec 27 15:32:19 pve kernel: pcieport 0000:00:1c.6: AER: TLP Header: 34000000 03000010 00000000 90039003
Dec 27 15:32:21 pve kernel: pcieport 0000:00:1c.6: DPC: Data Link Layer Link Active not set in 1000 msec
Dec 27 15:32:21 pve kernel: pcieport 0000:00:1c.6: AER: subordinate device reset failed
Dec 27 15:32:21 pve kernel: pcieport 0000:00:1c.6: AER: device recovery failed
Dec 27 15:32:21 pve kernel: pcieport 0000:00:1c.6: DPC: containment event, status:0x1f01 source:0x0000
Dec 27 15:32:21 pve kernel: pcieport 0000:00:1c.6: DPC: unmasked uncorrectable error detected
Dec 27 15:32:21 pve kernel: pcieport 0000:00:1c.6: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Dec 27 15:32:21 pve kernel: pcieport 0000:00:1c.6: device [8086:4dbe] error status/mask=00100000/00000000
Dec 27 15:32:21 pve kernel: pcieport 0000:00:1c.6: [20] UnsupReq (First)
Dec 27 15:32:21 pve kernel: pcieport 0000:00:1c.6: AER: TLP Header: 34000000 03000010 00000000 90039003
Dec 27 15:32:23 pve kernel: pcieport 0000:00:1c.6: DPC: Data Link Layer Link Active not set in 1000 msec
Dec 27 15:32:23 pve kernel: pcieport 0000:00:1c.6: AER: subordinate device reset failed
Dec 27 15:32:23 pve kernel: pcieport 0000:00:1c.6: AER: device recovery failed
Dec 27 15:32:23 pve kernel: pcieport 0000:00:1c.6: DPC: containment event, status:0x1f01 source:0x0000
Dec 27 15:32:23 pve kernel: pcieport 0000:00:1c.6: DPC: unmasked uncorrectable error detected
Dec 27 15:32:23 pve kernel: pcieport 0000:00:1c.6: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Dec 27 15:32:23 pve kernel: pcieport 0000:00:1c.6: device [8086:4dbe] error status/mask=00100000/00000000
Dec 27 15:32:23 pve kernel: pcieport 0000:00:1c.6: [20] UnsupReq (First)
Dec 27 15:32:23 pve kernel: pcieport 0000:00:1c.6: AER: TLP Header: 34000000 03000010 00000000 90039003
Dec 27 15:32:24 pve pvestatd[1222]: storage 'hi' is not online
Dec 27 15:32:25 pve kernel: pcieport 0000:00:1c.6: DPC: Data Link Layer Link Active not set in 1000 msec
Dec 27 15:32:25 pve kernel: pcieport 0000:00:1c.6: AER: subordinate device reset failed
Dec 27 15:32:25 pve kernel: pcieport 0000:00:1c.6: AER: device recovery failed
Dec 27 15:32:25 pve kernel: pcieport 0000:00:1c.6: DPC: containment event, status:0x1f01 source:0x0000
Dec 27 15:32:25 pve kernel: pcieport 0000:00:1c.6: DPC: unmasked uncorrectable error detected
Dec 27 15:32:25 pve kernel: pcieport 0000:00:1c.6: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Dec 27 15:32:25 pve kernel: pcieport 0000:00:1c.6: device [8086:4dbe] error status/mask=00100000/00000000
Dec 27 15:32:25 pve kernel: pcieport 0000:00:1c.6: [20] UnsupReq (First)
Dec 27 15:32:25 pve kernel: pcieport 0000:00:1c.6: AER: TLP Header: 34000000 03000010 00000000 90039003
Last edited: