Hello!
Promox 8.0.4
Platform: Intel NUC Extreme 12, i9, 64gb RAM, 2x 2.M 2TB drives
Filesystem: ZFS
This is a new setup and has been working pretty good, however a few days ago the issues started to happen. There has been no hardware or software change except updating to 8.0.4.
Promox has one virtual machine that is an Ubuntu Server 22.04 latest updates that has a Plex Server running.
I have the integrated GPU passthrough setup to the Plex Server and it works great. ( I have some output of commands below )
However at least once a day the server is just frozen and unresponsive. Like no blinking cursor on the screen it's just frozen. I have to force power cycle to get it to come back up.
I have gone through the Promox Syslog and maybe it's an interrupt issue? Not sure.
Here is the last few lines of the syslog before its rebooted. I also included output of some other commands and text files of interest.
Syslog from Promox when the VM is started:
Promox 8.0.4
Platform: Intel NUC Extreme 12, i9, 64gb RAM, 2x 2.M 2TB drives
Filesystem: ZFS
This is a new setup and has been working pretty good, however a few days ago the issues started to happen. There has been no hardware or software change except updating to 8.0.4.
Promox has one virtual machine that is an Ubuntu Server 22.04 latest updates that has a Plex Server running.
I have the integrated GPU passthrough setup to the Plex Server and it works great. ( I have some output of commands below )
However at least once a day the server is just frozen and unresponsive. Like no blinking cursor on the screen it's just frozen. I have to force power cycle to get it to come back up.
I have gone through the Promox Syslog and maybe it's an interrupt issue? Not sure.
Code:
Nov 12 10:30:50 pve QEMU[2471534]: kvm: vfio-pci: Cannot read device rom at 0000:00:02.0
Nov 12 10:30:50 pve QEMU[2471534]: Device option ROM contents are probably invalid (check dmesg).
Nov 12 10:30:50 pve QEMU[2471534]: Skip option ROM probe with rombar=0, or load from file with romfile=
Nov 12 10:30:50 pve kernel: vfio-pci 0000:00:02.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x851d
Here is the last few lines of the syslog before its rebooted. I also included output of some other commands and text files of interest.
Code:
Nov 10 17:08:41 pve postfix/smtp[323186]: 9B46B1B2D2: to=<michael@aghy.net>, relay=none, delay=401125, delays=401035/0.01/90/0, dsn=4.4.1, status=deferred (connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out)
Nov 10 17:08:42 pve postfix/smtp[323183]: connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out
Nov 10 17:08:42 pve postfix/smtp[323183]: 6A68C18258: to=<michael@aghy.net>, relay=none, delay=139307, delays=139216/0.01/91/0, dsn=4.4.1, status=deferred (connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out)
Nov 10 17:08:42 pve postfix/error[324819]: AFB8A182D8: to=<michael@aghy.net>, relay=none, delay=142799, delays=142708/91/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out)
Nov 10 17:08:42 pve postfix/error[324820]: A0D681C37B: to=<michael@aghy.net>, relay=none, delay=225737, delays=225646/91/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out)
Nov 10 17:08:42 pve postfix/error[324819]: 72EC41B703: to=<michael@aghy.net>, relay=none, delay=398621, delays=398530/91/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out)
Nov 10 17:08:42 pve postfix/error[324820]: 79A2E1C088: to=<michael@aghy.net>, relay=none, delay=304096, delays=304005/91/0/0, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out)
Nov 10 17:08:42 pve postfix/smtp[323187]: connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out
Nov 10 17:08:42 pve postfix/smtp[323187]: 950C01BA58: to=<michael@aghy.net>, relay=none, delay=312145, delays=312054/0.02/91/0, dsn=4.4.1, status=deferred (connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out)
Nov 10 17:08:43 pve postfix/smtp[323181]: connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out
Nov 10 17:08:43 pve postfix/smtp[323181]: 811641BDCA: to=<michael@aghy.net>, relay=none, delay=312326, delays=312234/0.01/92/0, dsn=4.4.1, status=deferred (connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out)
Nov 10 17:10:31 pve pvedaemon[2246]: <root@pam> successful auth for user 'root@pam'
Nov 10 17:12:11 pve postfix/qmgr[2185]: B8247183D3: from=<michael@aghy.net>, size=26449, nrcpt=1 (queue active)
Nov 10 17:12:11 pve postfix/error[328075]: B8247183D3: to=<michael@aghy.net>, relay=none, delay=52965, delays=52965/0.01/0/0.01, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to alt2.aspmx.l.google.com[64.233.184.26]:25: Connection timed out)
-- Reboot --
Code:
root@pve:~# lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers [8086:4660] (rev 02)
00:02.0 VGA compatible controller [0300]: Intel Corporation AlderLake-S GT1 [8086:4680] (rev 0c)
00:06.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 [8086:464d] (rev 02)
00:08.0 System peripheral [0880]: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator [8086:464f] (rev 02)
00:14.0 USB controller [0c03]: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller [8086:7ae0] (rev 11)
00:14.2 RAM memory [0500]: Intel Corporation Alder Lake-S PCH Shared SRAM [8086:7aa7] (rev 11)
00:14.3 Network controller [0280]: Intel Corporation Alder Lake-S PCH CNVi WiFi [8086:7af0] (rev 11)
00:16.0 Communication controller [0780]: Intel Corporation Alder Lake-S PCH HECI Controller #1 [8086:7ae8] (rev 11)
00:17.0 SATA controller [0106]: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] [8086:7ae2] (rev 11)
00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:7ac4] (rev 11)
00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:7abe] (rev 11)
00:1d.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #9 [8086:7ab0] (rev 11)
00:1d.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #13 [8086:7ab4] (rev 11)
00:1f.0 ISA bridge [0601]: Intel Corporation Z690 Chipset LPC/eSPI Controller [8086:7a84] (rev 11)
00:1f.3 Audio device [0403]: Intel Corporation Alder Lake-S HD Audio Controller [8086:7ad0] (rev 11)
00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake-S PCH SMBus Controller [8086:7aa3] (rev 11)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH SPI Controller [8086:7aa4] (rev 11)
01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a]
02:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
03:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
03:01.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
03:02.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
03:03.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
04:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020] [8086:1137]
38:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020] [8086:1138]
6c:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-LM [8086:15f2] (rev 03)
6d:00.0 Ethernet controller [0200]: Aquantia Corp. AQC113C NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] [1d6a:14c0] (rev 03)
6e:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a]
Code:
root@pve:/etc/modprobe.d# cat intel-uhd-passthru.conf
install i915 /bin/false
options vfio-pci ids=8086:4680
Code:
root@pve:/etc/modprobe.d# cat pve-blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE
# nvidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
blacklist i915
Syslog from Promox when the VM is started:
Code:
Nov 12 10:30:47 pve systemd[1]: Started 100.scope.
Nov 12 10:30:47 pve kernel: device tap100i0 entered promiscuous mode
Nov 12 10:30:47 pve kernel: vmbr0: port 2(fwpr100p0) entered blocking state
Nov 12 10:30:47 pve kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Nov 12 10:30:47 pve kernel: device fwpr100p0 entered promiscuous mode
Nov 12 10:30:47 pve kernel: vmbr0: port 2(fwpr100p0) entered blocking state
Nov 12 10:30:47 pve kernel: vmbr0: port 2(fwpr100p0) entered forwarding state
Nov 12 10:30:47 pve kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Nov 12 10:30:47 pve kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Nov 12 10:30:47 pve kernel: device fwln100i0 entered promiscuous mode
Nov 12 10:30:47 pve kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Nov 12 10:30:47 pve kernel: fwbr100i0: port 1(fwln100i0) entered forwarding state
Nov 12 10:30:47 pve kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Nov 12 10:30:47 pve kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Nov 12 10:30:47 pve kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Nov 12 10:30:47 pve kernel: fwbr100i0: port 2(tap100i0) entered forwarding state
Nov 12 10:30:48 pve kernel: vfio-pci 0000:00:02.0: vfio_ecap_init: hiding ecap 0x1b@0x100
Nov 12 10:30:49 pve pvedaemon[2616]: <root@pam> end task UPID:pve:0025B663:003747EA:6550EFA6:qmstart:100:root@pam: OK
Nov 12 10:30:49 pve kernel: hrtimer: interrupt took 9444153 ns
Nov 12 10:30:49 pve kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.240 msecs
Nov 12 10:30:49 pve kernel: perf: interrupt took too long (8386 > 3923), lowering kernel.perf_event_max_sample_rate to 23750
Nov 12 10:30:49 pve kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.890 msecs
Nov 12 10:30:49 pve kernel: perf: interrupt took too long (13054 > 10482), lowering kernel.perf_event_max_sample_rate to 15250
Nov 12 10:30:50 pve QEMU[2471534]: kvm: vfio-pci: Cannot read device rom at 0000:00:02.0
Nov 12 10:30:50 pve QEMU[2471534]: Device option ROM contents are probably invalid (check dmesg).
Nov 12 10:30:50 pve QEMU[2471534]: Skip option ROM probe with rombar=0, or load from file with romfile=
Nov 12 10:30:50 pve kernel: vfio-pci 0000:00:02.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x851d
Nov 12 10:30:52 pve pvedaemon[2615]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Nov 12 10:30:53 pve kernel: vfio-pci 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
Last edited: