Proxmox random crashes without logs

Utility6693 · 2025-10-22T19:52:21+0200

Hi all,

I hope maybe someone here has an answer, I am at my wits end. My proxmox instance keeps on crashing, it can take minutes or hours, but it will eventually crash.
I have to manually powercycle the device, it's completely unresponsive. Neither host or any vm's or lxc's are reachable. Attaching a HMDI cable just shows a black screen.
There are no errors or other issues to be found in the logfiles.

What I'm running:
- An LXC with unbound
- An LXC with an addblocker
- A VM with home assistant
- A VM with some docker containers

The hardware:
- an intel NUC nuc11tnhi5 (11th gen I5, 4 cores, 1 socket)
- 32GB of memory
- 2 x 1tb samsung ssd's (2nd one is mostly for backups)

Until last week I was running on version 8.0.14 without any problems for ~2 years, but only if my VM's had the numa flag turned on (cpu properties). After having these issues I eventually found out this fixed it for me.

Last week I decided to turn on the pve-no-subscription repo and update to 8.4. The random freezes start right away. After a day or two I updated everything to 9.0, no success there either.

My system info (ran the 6.8 kernel earlier today):

Code:

proxmox-ve: 9.0.0 (running kernel: 6.17.1-1-pve)
pve-manager: 9.0.11 (running version: 9.0.11/3bf5476b8a4699e2)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.1-1-pve-signed: 6.17.1-1
proxmox-kernel-6.17: 6.17.1-1
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
intel-microcode: 3.20250512.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.11
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.8
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.16-1
proxmox-backup-file-restore: 4.0.16-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.0
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-widget-toolkit: 5.0.6
pve-cluster: 9.0.6
pve-container: 6.0.13
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.17-2
pve-ha-manager: 5.0.5
pve-i18n: 3.6.1
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.23
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve2
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

I am pretty sure it's a software issue, since it was running fine before.

A section of log up until a crash in the morning after having been running idle all night:

Code:

-- Boot 63811b0cc80a461da990354de5b7b4b9 --
Oct 22 08:20:49 nuc systemd[1]: Finished man-db.service - Daily man-db regeneration.
Oct 22 08:20:49 nuc systemd[1]: man-db.service: Deactivated successfully.
Oct 22 08:20:48 nuc systemd[1]: Starting man-db.service - Daily man-db regeneration...
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session closed for user root
Oct 22 08:17:01 nuc CRON[202250]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session closed for user root
Oct 22 07:17:01 nuc CRON[184538]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:48:49 nuc systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Oct 22 06:48:49 nuc systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Oct 22 06:48:48 nuc systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session closed for user root
Oct 22 06:25:01 nuc CRON[169082]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session closed for user root
Oct 22 06:17:01 nuc CRON[166770]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session closed for user root
Oct 22 05:17:01 nuc CRON[149177]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session closed for user root
Oct 22 04:17:01 nuc CRON[131537]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:15:51 nuc systemd[1]: pve-daily-update.service: Consumed 2.509s CPU time, 327M memory peak.
Oct 22 08:20:49 nuc systemd[1]: Finished man-db.service - Daily man-db regeneration.
Oct 22 08:20:49 nuc systemd[1]: man-db.service: Deactivated successfully.
Oct 22 08:20:48 nuc systemd[1]: Starting man-db.service - Daily man-db regeneration...
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session closed for user root
Oct 22 08:17:01 nuc CRON[202250]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session closed for user root
Oct 22 07:17:01 nuc CRON[184538]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:48:49 nuc systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Oct 22 06:48:49 nuc systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Oct 22 06:48:48 nuc systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session closed for user root
Oct 22 06:25:01 nuc CRON[169082]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session closed for user root
Oct 22 06:17:01 nuc CRON[166770]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session closed for user root
Oct 22 05:17:01 nuc CRON[149177]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session closed for user root
Oct 22 04:17:01 nuc CRON[131537]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:15:51 nuc systemd[1]: pve-daily-update.service: Consumed 2.509s CPU time, 327M memory peak.
Oct 22 04:15:51 nuc systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Oct 22 04:15:51 nuc systemd[1]: pve-daily-update.service: Deactivated successfully.
Oct 22 04:15:51 nuc pveupdate[131172]: <root@pam> end task UPID:nuc:0002006B:0027B272:68F83E55:aptupdate::root@pam: OK
Oct 22 04:15:50 nuc pveupdate[131179]: update new package list: /var/lib/pve-manager/pkgupdates
Oct 22 04:15:49 nuc pveupdate[131172]: <root@pam> starting task UPID:nuc:0002006B:0027B272:68F83E55:aptupdate::root@pam:
Oct 22 04:15:48 nuc systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Oct 22 03:54:49 nuc systemd[1]: Finished apt-daily.service - Daily apt download activities.
Oct 22 03:54:49 nuc systemd[1]: apt-daily.service: Deactivated successfully.
Oct 22 03:54:48 nuc systemd[1]: Starting apt-daily.service - Daily apt download activities...
Oct 22 03:40:35 nuc chronyd[984]: Source 45.138.55.61 replaced with 178.215.228.24 (2.debian.pool.ntp.org)
Oct 22 03:17:01 nuc CRON[113742]: pam_unix(cron:session): session closed for user root
Oct 22 03:17:01 nuc CRON[113744]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 03:17:01 nuc CRON[113742]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 03:10:01 nuc CRON[111660]: pam_unix(cron:session): session closed for user root
Oct 22 03:10:01 nuc CRON[111662]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Oct 22 03:10:01 nuc CRON[111660]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 03:01:40 nuc kernel: vmbr0: port 1(enp88s0) entered forwarding state
Oct 22 03:01:40 nuc kernel: vmbr0: port 1(enp88s0) entered blocking state
Oct 22 03:01:40 nuc kernel: igc 0000:58:00.0 enp88s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Oct 22 03:00:18 nuc kernel: vmbr0: port 1(enp88s0) entered disabled state
Oct 22 03:00:18 nuc kernel: igc 0000:58:00.0 enp88s0: NIC Link is Down
Oct 22 02:17:01 nuc CRON[96248]: pam_unix(cron:session): session closed for user root
Oct 22 02:17:01 nuc CRON[96250]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)

What I have done to try and fix it:
- Updated the BIOS to the latest version.
- Turned off various settings in the BIOS related to power settings, and onboard devices I dont use (wifi, BT).
- Changed the CPU settings for the VM, from type host to kvm64, to the default x86-64-v2-AES.
- Switched from the stable to the newest 6.14 kernel.
- Limited the resource usage of my docker containers to below the VM's ram memory, just in case.
- Turned off VM ballooning to see if it makes a difference.
- Turned off either VM, still crashed in both instances.
- Let an external device monitor it with dmesg --follow and journalctl -f, no extra information.
- Ran a memory test, 2 passes, all good.

The system will crash at any time, but it seems to happen mostly when being idle, which it basically is doing most of the time.
It's not running out of memory, almost half the RAM is not provisioned to VM's or LXC's. The maximum CPU load is also ~50% when booting up.

I'm at a loss, hope someone has an idea.

Guillaume Delanoy · 2025-10-23T09:13:15+0200

Hello,
You wrote:

> I am pretty sure it's a software issue, since it was running fine before.

Maybe. Or maybe not. Hardware gets old, too.
Anyway, finding Information about your problem is the way to help you solve it.
Therefore, I would recommend configuring journald, so that you keep infos

Utility6693 · 2025-10-23T11:34:53+0200

Guillaume Delanoy said:
Hello,
You wrote:

> I am pretty sure it's a software issue, since it was running fine before.

Maybe. Or maybe not. Hardware gets old, too.
Anyway, finding Information about your problem is the way to help you solve it.
Therefore, I would recommend configuring journald, so that you keep infos

Is there anything more I can do to keep logs?

I had an external system watching journalctl and dmesg live in a tmux-session, no extra info there either.

gfngfn256 · 2025-10-23T12:59:33+0200

Utility6693 said:
but only if my VM's had the numa flag turned on

This is surprising to say the least, since that NUC nuc11tnhi5 is single-socketed.

Utility6693 said:
My system info (ran the 6.8 kernel earlier today):

So what is this?

Utility6693 said:
proxmox-ve: 9.0.0 (running kernel: 6.17.1-1-pve)

The 6.17 kernel is currently opt in only, as shown here.

I guess you have been doing major testing/messing with your system to get it working (& more so, historically over the years).

What I would try if I were you:
1. Ensure you have full backups (restorable) of all VMs LXCs (& their documented config's, storage setup etc.)
2. Remove the original drive/s from the NUC, & install a fresh/clean/new drive (at least for testing).
3. Install fresh PVE 9.
4. Test (with time) before adding/restoring any VMs or LXCs for stability.
5. If above succeeds, try adding/restoring (one by one) the VMs & LXCs & test for stability.

Possible suspects:
1. Power issue/PSU on that mini pc.
2. Thermals on that mini pc.
3. Storage issue. It appears, that your host, VMs & LXCs all live on a singular disk location, is that a NVMe or SSD (I believe your NUC has both). It may be worth trying a different disk for comparison.

Search

Search

Proxmox random crashes without logs

Utility6693

New Member

Guillaume Delanoy

Member

Utility6693

New Member

gfngfn256

Distinguished Member

We value your privacy