Proxmox random crashes without logs

Utility6693

New Member
Oct 22, 2025
2
0
1
Hi all,

I hope maybe someone here has an answer, I am at my wits end. My proxmox instance keeps on crashing, it can take minutes or hours, but it will eventually crash.
I have to manually powercycle the device, it's completely unresponsive. Neither host or any vm's or lxc's are reachable. Attaching a HMDI cable just shows a black screen.
There are no errors or other issues to be found in the logfiles.

What I'm running:
- An LXC with unbound
- An LXC with an addblocker
- A VM with home assistant
- A VM with some docker containers

The hardware:
- an intel NUC nuc11tnhi5 (11th gen I5, 4 cores, 1 socket)
- 32GB of memory
- 2 x 1tb samsung ssd's (2nd one is mostly for backups)

Until last week I was running on version 8.0.14 without any problems for ~2 years, but only if my VM's had the numa flag turned on (cpu properties). After having these issues I eventually found out this fixed it for me.

Last week I decided to turn on the pve-no-subscription repo and update to 8.4. The random freezes start right away. After a day or two I updated everything to 9.0, no success there either.

My system info (ran the 6.8 kernel earlier today):
Code:
proxmox-ve: 9.0.0 (running kernel: 6.17.1-1-pve)
pve-manager: 9.0.11 (running version: 9.0.11/3bf5476b8a4699e2)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.1-1-pve-signed: 6.17.1-1
proxmox-kernel-6.17: 6.17.1-1
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
intel-microcode: 3.20250512.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.11
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.8
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.16-1
proxmox-backup-file-restore: 4.0.16-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.0
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-widget-toolkit: 5.0.6
pve-cluster: 9.0.6
pve-container: 6.0.13
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.17-2
pve-ha-manager: 5.0.5
pve-i18n: 3.6.1
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.23
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve2
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

I am pretty sure it's a software issue, since it was running fine before.

A section of log up until a crash in the morning after having been running idle all night:
Code:
-- Boot 63811b0cc80a461da990354de5b7b4b9 --
Oct 22 08:20:49 nuc systemd[1]: Finished man-db.service - Daily man-db regeneration.
Oct 22 08:20:49 nuc systemd[1]: man-db.service: Deactivated successfully.
Oct 22 08:20:48 nuc systemd[1]: Starting man-db.service - Daily man-db regeneration...
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session closed for user root
Oct 22 08:17:01 nuc CRON[202250]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session closed for user root
Oct 22 07:17:01 nuc CRON[184538]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:48:49 nuc systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Oct 22 06:48:49 nuc systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Oct 22 06:48:48 nuc systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session closed for user root
Oct 22 06:25:01 nuc CRON[169082]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session closed for user root
Oct 22 06:17:01 nuc CRON[166770]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session closed for user root
Oct 22 05:17:01 nuc CRON[149177]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session closed for user root
Oct 22 04:17:01 nuc CRON[131537]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:15:51 nuc systemd[1]: pve-daily-update.service: Consumed 2.509s CPU time, 327M memory peak.
Oct 22 08:20:49 nuc systemd[1]: Finished man-db.service - Daily man-db regeneration.
Oct 22 08:20:49 nuc systemd[1]: man-db.service: Deactivated successfully.
Oct 22 08:20:48 nuc systemd[1]: Starting man-db.service - Daily man-db regeneration...
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session closed for user root
Oct 22 08:17:01 nuc CRON[202250]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session closed for user root
Oct 22 07:17:01 nuc CRON[184538]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:48:49 nuc systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Oct 22 06:48:49 nuc systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Oct 22 06:48:48 nuc systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session closed for user root
Oct 22 06:25:01 nuc CRON[169082]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session closed for user root
Oct 22 06:17:01 nuc CRON[166770]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session closed for user root
Oct 22 05:17:01 nuc CRON[149177]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session closed for user root
Oct 22 04:17:01 nuc CRON[131537]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:15:51 nuc systemd[1]: pve-daily-update.service: Consumed 2.509s CPU time, 327M memory peak.
Oct 22 04:15:51 nuc systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Oct 22 04:15:51 nuc systemd[1]: pve-daily-update.service: Deactivated successfully.
Oct 22 04:15:51 nuc pveupdate[131172]: <root@pam> end task UPID:nuc:0002006B:0027B272:68F83E55:aptupdate::root@pam: OK
Oct 22 04:15:50 nuc pveupdate[131179]: update new package list: /var/lib/pve-manager/pkgupdates
Oct 22 04:15:49 nuc pveupdate[131172]: <root@pam> starting task UPID:nuc:0002006B:0027B272:68F83E55:aptupdate::root@pam:
Oct 22 04:15:48 nuc systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Oct 22 03:54:49 nuc systemd[1]: Finished apt-daily.service - Daily apt download activities.
Oct 22 03:54:49 nuc systemd[1]: apt-daily.service: Deactivated successfully.
Oct 22 03:54:48 nuc systemd[1]: Starting apt-daily.service - Daily apt download activities...
Oct 22 03:40:35 nuc chronyd[984]: Source 45.138.55.61 replaced with 178.215.228.24 (2.debian.pool.ntp.org)
Oct 22 03:17:01 nuc CRON[113742]: pam_unix(cron:session): session closed for user root
Oct 22 03:17:01 nuc CRON[113744]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 03:17:01 nuc CRON[113742]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 03:10:01 nuc CRON[111660]: pam_unix(cron:session): session closed for user root
Oct 22 03:10:01 nuc CRON[111662]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Oct 22 03:10:01 nuc CRON[111660]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 03:01:40 nuc kernel: vmbr0: port 1(enp88s0) entered forwarding state
Oct 22 03:01:40 nuc kernel: vmbr0: port 1(enp88s0) entered blocking state
Oct 22 03:01:40 nuc kernel: igc 0000:58:00.0 enp88s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Oct 22 03:00:18 nuc kernel: vmbr0: port 1(enp88s0) entered disabled state
Oct 22 03:00:18 nuc kernel: igc 0000:58:00.0 enp88s0: NIC Link is Down
Oct 22 02:17:01 nuc CRON[96248]: pam_unix(cron:session): session closed for user root
Oct 22 02:17:01 nuc CRON[96250]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)

What I have done to try and fix it:
- Updated the BIOS to the latest version.
- Turned off various settings in the BIOS related to power settings, and onboard devices I dont use (wifi, BT).
- Changed the CPU settings for the VM, from type host to kvm64, to the default x86-64-v2-AES.
- Switched from the stable to the newest 6.14 kernel.
- Limited the resource usage of my docker containers to below the VM's ram memory, just in case.
- Turned off VM ballooning to see if it makes a difference.
- Turned off either VM, still crashed in both instances.
- Let an external device monitor it with dmesg --follow and journalctl -f, no extra information.
- Ran a memory test, 2 passes, all good.

The system will crash at any time, but it seems to happen mostly when being idle, which it basically is doing most of the time.
It's not running out of memory, almost half the RAM is not provisioned to VM's or LXC's. The maximum CPU load is also ~50% when booting up.

I'm at a loss, hope someone has an idea.
 
Last edited:
Hello,
You wrote:

> I am pretty sure it's a software issue, since it was running fine before.

Maybe. Or maybe not. Hardware gets old, too.
Anyway, finding Information about your problem is the way to help you solve it.
Therefore, I would recommend configuring journald, so that you keep infos
 
Hello,
You wrote:

> I am pretty sure it's a software issue, since it was running fine before.

Maybe. Or maybe not. Hardware gets old, too.
Anyway, finding Information about your problem is the way to help you solve it.
Therefore, I would recommend configuring journald, so that you keep infos
Is there anything more I can do to keep logs?

I had an external system watching journalctl and dmesg live in a tmux-session, no extra info there either.
 
but only if my VM's had the numa flag turned on
This is surprising to say the least, since that NUC nuc11tnhi5 is single-socketed.

My system info (ran the 6.8 kernel earlier today):
So what is this?
proxmox-ve: 9.0.0 (running kernel: 6.17.1-1-pve)
The 6.17 kernel is currently opt in only, as shown here.

I guess you have been doing major testing/messing with your system to get it working (& more so, historically over the years).

What I would try if I were you:
1. Ensure you have full backups (restorable) of all VMs LXCs (& their documented config's, storage setup etc.)
2. Remove the original drive/s from the NUC, & install a fresh/clean/new drive (at least for testing).
3. Install fresh PVE 9.
4. Test (with time) before adding/restoring any VMs or LXCs for stability.
5. If above succeeds, try adding/restoring (one by one) the VMs & LXCs & test for stability.

Possible suspects:
1. Power issue/PSU on that mini pc.
2. Thermals on that mini pc.
3. Storage issue. It appears, that your host, VMs & LXCs all live on a singular disk location, is that a NVMe or SSD (I believe your NUC has both). It may be worth trying a different disk for comparison.
 
  • Like
Reactions: news
Hello,

Getting more information from your systems may help you find out where the failure comes from
Is there anything more I can do to keep logs?

I had an external system watching journalctl and dmesg live in a tmux-session, no extra info there either.

If possible, using an external syslog server anywhere else on the LAN would ensure you get to the information anyway.
Once journald keeps track of previous boot messages and/or errors if any, you may ask it with:
journalctl -b -1 -p 3 (infos from previous boot, filtering on errors).
Installing the linux-crashdump package may help you keep track of previous crashes, as well as using systemd-coredump to capture crash data.
 
That's a quite interesting thread because I've seen the same behavior for months (pve8 and now 9) on 2 of my 3 NUCs running my (homelab) cluster. They were crashing randomly sometimes (like 1 or 2 times a month). It has only gotten worse the last days, they're now crashing almost everyday or even multiple times a day.

So to summarize a bit the hardware setup of my cluster:
  • One (quite old) NUC7i5BNH. I won't talk about this one because it never crashes.
  • Two strictly identical NUC12WSHi7
    • RAM: 64 GiB (2 x 32 GiB SODIMM DDR4 Synchronous 3200 MHz, vendor: G Skill Intl)
    • Boot SSD: KINGSTON SNV2S250G
    • Data SSD (Ceph OSD): KINGSTON SEDC600
When one the NUC12WSHi7 crashes, its fan starts turning at max speed, screen blank. Need to power off/on to restart it (had to plug them on HomeKit compatible power sockets to be able to restart them remotely when out of home). Tried (kernel) remote syslog but nothing...

Edit: Never had to turn the numa flag on.
 
Last edited:
Hi,
Don't know if related I upgraded to kernel 6.14.11-4-pve and I got all kinds of trouble reboot loops when I rebooted and it eventually become stable but would then reboot after one days with no logs and I rolled back to 6.14.11-3-pve and it is once again totally stable.

Thanks,

Martin.
 
I got a kernel panic on my side with kernel 6.14.11-4-pve on every boot. Error message: "Kernel Panic: VFS: Unable to mount root fs on unknown-block(0,0)". Rolling back to 6.14.11-3-pve fixed the issue for me.
 
I got a kernel panic on my side with kernel 6.14.11-4-pve on every boot. Error message: "Kernel Panic: VFS: Unable to mount root fs on unknown-block(0,0)". Rolling back to 6.14.11-3-pve fixed the issue for me.
Sorry for the late reply but agreed 100% there is an issue with that kernel but I admit its the first time I remember and been using proxmox for a while now.
 
I've been been setting up a cluster for a week with an intel nuc (formerly used as a dedicated roon rock). The other 2 are AMD and seemingly fine. I've found multiple issues. Have tried both the last couple of kernels which appear fine (apart from the crashes). Every machine has simple ext4 for o/s disk (SSD). The nuc has second data SSD (zfs - found that zfs on os causes all sorts of niggles mostly related to boot with additional disks as zfs)..
The nuc is doing less each time, was just the proxmox backup but backups fail every time so probably having to relegate to just a quorum next. What I have found so far in case it's useful:

Microcode - lots of issues posted made me think was the solution. Unfortunately a red-herring as primarily v.8 related. Fix already added automatically in v.9 ("non-free-firmware" in Debian repository).

E1000e network card - network dies randomly, typically under load. Community script seems to fix that (network now stable for me).

No attached monitor - dies randomly, attach cable or dongle or "nomodeset" kernel parameter.

Thought I'd add above for anyone else, and watching to see if any further fixes found...
 
Last edited: