[SOLVED] PVE 8.0.3 freezes on NUC11, two identical nodes, problem follows only one installation

Deleted member 205422 · Nov 3, 2023

So I have currently two identical Intel NUC11PAHi3 running as nodes for testing, no VMs, LXCs only. This is my first Proxmox experience and also with this forum, I am afraid this will be a lonely post that no one replies to from what I found (similar to my situation):
> https://forum.proxmox.com/threads/issue-with-proxmox-network-getting-disabled.128580/

My syslog (Stopping User Manager for UID 0) before a freeze looks about the same as here, there's no NFS in play however and besides the network being unresponsive the system as a whole is dead - blank screen and needs cold power cycle after which it starts up just fine.
> https://forum.proxmox.com/threads/proxmox-keeps-freezing-out-of-options.131750/

So my symptoms are very much like here, but unlike there I get no KSM related message nor page faults in dmesg.
> https://forum.proxmox.com/threads/t...d-try-booting-with-the-irqpoll-option.111425/

In fact there's nothing more suspicious in dmesg than what I found in this post above and interestingly, there's the same kind of devices on IRQ16, namely NVMe SSD, Audio device, Tiger Lake-LP Thunderbolt 4 NHI. I figured this is not the issue as the dmesg messages are early on after boot and they happen on both machines only one of which freezes.

I very much excluded hardware being the culprit as I moved both NVMe SSD and RAM between the two nodes, only one of which is freezing, the freezing moved to the machine with the specific PVE install. I then MEMTEST-ed the RAM module with nothing found, neverthless I exchanged it for another identical module and freezes persist. The last remaing potential hardware issue would be the NVMe SSD, which has been used has perfect health with about 1% use indicator and is good old trusty Kingston KC2000 so not even running hot - it was running without any issues with plain Debian till it became part of these testing nodes.

The PVE installs are - as far as I am concerned - identical, there are no VMs, just a few LXCs running and they are most recent APT upgraded.

Before I try to reinstall the same from scratch or finally exchange the SSD (which will not prove anything if it was installation-specific), I plan to move around the CTs a bit. The last thing I suspect is there's Docker inside some of the CTs running on the offending node. But they run nested and unprivileged. So if that was indeed the reason PVE is freezing I am afraid I am done with the testing of PVE.

To avoid any NUC and old kernel issues, I had put max_cstate=1 in the kernel boot options on both nodes early on, I do not think it is causing anything but the nodes freezing indeed was the one less busy. The UEFI are both up to date firmware version on each NUC. As there's nothing in the logs it's really hard to troubleshoot, but plain Debian runs just fine on these NUCs. Also perhaps worth mentioning the two nodes are set to be in a cluster. I understand a 2-node cluster without a q-device and equal votes for each node is almost useless, but for testing this was good enough and it certainly should not be freezing the host.

EDIT: Found this post below and will give a try to mitigations=off, but again this would not explain why only one node freezes.
> https://forum.proxmox.com/threads/o...ly-freezing-pre-and-post-pve8-upgrade.129990/

Thanks for any help or hope it helps someone else maybe in the future googling like me.

Deleted member 205422 · Nov 3, 2023

The dmesg irq message is now gone after disabling the HD Audio (and other PVE irrelevant) devices in UEFI/BIOS, no freezes with uptime 20h for now, but still would not explain why they were not happening on the other of the two identical configurations.

Deleted member 205422 · Nov 7, 2023

So for anyone googling in the future, this was indeed the HD Audio device hogging the IRQ on NUC11.

Search

Search

[SOLVED] PVE 8.0.3 freezes on NUC11, two identical nodes, problem follows only one installation

Deleted member 205422

Guest

Deleted member 205422

Guest

Deleted member 205422

Guest