Hi all,
I hope maybe someone here has an answer, I am at my wits end. My proxmox instance keeps on crashing, it can take minutes or hours, but it will eventually crash.
I have to manually powercycle the device, it's completely unresponsive. Neither host or any vm's or lxc's are reachable. Attaching a HMDI cable just shows a black screen.
There are no errors or other issues to be found in the logfiles.
What I'm running:
- An LXC with unbound
- An LXC with an addblocker
- A VM with home assistant
- A VM with some docker containers
The hardware:
- an intel NUC nuc11tnhi5 (11th gen I5, 4 cores, 1 socket)
- 32GB of memory
- 2 x 1tb samsung ssd's (2nd one is mostly for backups)
Until last week I was running on version 8.0.14 without any problems for ~2 years, but only if my VM's had the numa flag turned on (cpu properties). After having these issues I eventually found out this fixed it for me.
Last week I decided to turn on the
My system info (ran the 6.8 kernel earlier today):
I am pretty sure it's a software issue, since it was running fine before.
A section of log up until a crash in the morning after having been running idle all night:
What I have done to try and fix it:
- Updated the BIOS to the latest version.
- Turned off various settings in the BIOS related to power settings, and onboard devices I dont use (wifi, BT).
- Changed the CPU settings for the VM, from type
- Switched from the stable to the newest 6.14 kernel.
- Limited the resource usage of my docker containers to below the VM's ram memory, just in case.
- Turned off VM ballooning to see if it makes a difference.
- Turned off either VM, still crashed in both instances.
- Let an external device monitor it with
- Ran a memory test, 2 passes, all good.
The system will crash at any time, but it seems to happen mostly when being idle, which it basically is doing most of the time.
It's not running out of memory, almost half the RAM is not provisioned to VM's or LXC's. The maximum CPU load is also ~50% when booting up.
I'm at a loss, hope someone has an idea.
I hope maybe someone here has an answer, I am at my wits end. My proxmox instance keeps on crashing, it can take minutes or hours, but it will eventually crash.
I have to manually powercycle the device, it's completely unresponsive. Neither host or any vm's or lxc's are reachable. Attaching a HMDI cable just shows a black screen.
There are no errors or other issues to be found in the logfiles.
What I'm running:
- An LXC with unbound
- An LXC with an addblocker
- A VM with home assistant
- A VM with some docker containers
The hardware:
- an intel NUC nuc11tnhi5 (11th gen I5, 4 cores, 1 socket)
- 32GB of memory
- 2 x 1tb samsung ssd's (2nd one is mostly for backups)
Until last week I was running on version 8.0.14 without any problems for ~2 years, but only if my VM's had the numa flag turned on (cpu properties). After having these issues I eventually found out this fixed it for me.
Last week I decided to turn on the
pve-no-subscription
repo and update to 8.4. The random freezes start right away. After a day or two I updated everything to 9.0, no success there either.My system info (ran the 6.8 kernel earlier today):
Code:
proxmox-ve: 9.0.0 (running kernel: 6.17.1-1-pve)
pve-manager: 9.0.11 (running version: 9.0.11/3bf5476b8a4699e2)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.1-1-pve-signed: 6.17.1-1
proxmox-kernel-6.17: 6.17.1-1
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
intel-microcode: 3.20250512.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.11
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.8
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.16-1
proxmox-backup-file-restore: 4.0.16-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.0
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-widget-toolkit: 5.0.6
pve-cluster: 9.0.6
pve-container: 6.0.13
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.17-2
pve-ha-manager: 5.0.5
pve-i18n: 3.6.1
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.23
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve2
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1
I am pretty sure it's a software issue, since it was running fine before.
A section of log up until a crash in the morning after having been running idle all night:
Code:
-- Boot 63811b0cc80a461da990354de5b7b4b9 --
Oct 22 08:20:49 nuc systemd[1]: Finished man-db.service - Daily man-db regeneration.
Oct 22 08:20:49 nuc systemd[1]: man-db.service: Deactivated successfully.
Oct 22 08:20:48 nuc systemd[1]: Starting man-db.service - Daily man-db regeneration...
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session closed for user root
Oct 22 08:17:01 nuc CRON[202250]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session closed for user root
Oct 22 07:17:01 nuc CRON[184538]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:48:49 nuc systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Oct 22 06:48:49 nuc systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Oct 22 06:48:48 nuc systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session closed for user root
Oct 22 06:25:01 nuc CRON[169082]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session closed for user root
Oct 22 06:17:01 nuc CRON[166770]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session closed for user root
Oct 22 05:17:01 nuc CRON[149177]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session closed for user root
Oct 22 04:17:01 nuc CRON[131537]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:15:51 nuc systemd[1]: pve-daily-update.service: Consumed 2.509s CPU time, 327M memory peak.
Oct 22 08:20:49 nuc systemd[1]: Finished man-db.service - Daily man-db regeneration.
Oct 22 08:20:49 nuc systemd[1]: man-db.service: Deactivated successfully.
Oct 22 08:20:48 nuc systemd[1]: Starting man-db.service - Daily man-db regeneration...
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session closed for user root
Oct 22 08:17:01 nuc CRON[202250]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 08:17:01 nuc CRON[202247]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session closed for user root
Oct 22 07:17:01 nuc CRON[184538]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 07:17:01 nuc CRON[184536]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:48:49 nuc systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Oct 22 06:48:49 nuc systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Oct 22 06:48:48 nuc systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session closed for user root
Oct 22 06:25:01 nuc CRON[169082]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Oct 22 06:25:01 nuc CRON[169080]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session closed for user root
Oct 22 06:17:01 nuc CRON[166770]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 06:17:01 nuc CRON[166768]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session closed for user root
Oct 22 05:17:01 nuc CRON[149177]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 05:17:01 nuc CRON[149175]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session closed for user root
Oct 22 04:17:01 nuc CRON[131537]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 04:17:01 nuc CRON[131535]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 04:15:51 nuc systemd[1]: pve-daily-update.service: Consumed 2.509s CPU time, 327M memory peak.
Oct 22 04:15:51 nuc systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Oct 22 04:15:51 nuc systemd[1]: pve-daily-update.service: Deactivated successfully.
Oct 22 04:15:51 nuc pveupdate[131172]: <root@pam> end task UPID:nuc:0002006B:0027B272:68F83E55:aptupdate::root@pam: OK
Oct 22 04:15:50 nuc pveupdate[131179]: update new package list: /var/lib/pve-manager/pkgupdates
Oct 22 04:15:49 nuc pveupdate[131172]: <root@pam> starting task UPID:nuc:0002006B:0027B272:68F83E55:aptupdate::root@pam:
Oct 22 04:15:48 nuc systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Oct 22 03:54:49 nuc systemd[1]: Finished apt-daily.service - Daily apt download activities.
Oct 22 03:54:49 nuc systemd[1]: apt-daily.service: Deactivated successfully.
Oct 22 03:54:48 nuc systemd[1]: Starting apt-daily.service - Daily apt download activities...
Oct 22 03:40:35 nuc chronyd[984]: Source 45.138.55.61 replaced with 178.215.228.24 (2.debian.pool.ntp.org)
Oct 22 03:17:01 nuc CRON[113742]: pam_unix(cron:session): session closed for user root
Oct 22 03:17:01 nuc CRON[113744]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 22 03:17:01 nuc CRON[113742]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 03:10:01 nuc CRON[111660]: pam_unix(cron:session): session closed for user root
Oct 22 03:10:01 nuc CRON[111662]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Oct 22 03:10:01 nuc CRON[111660]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Oct 22 03:01:40 nuc kernel: vmbr0: port 1(enp88s0) entered forwarding state
Oct 22 03:01:40 nuc kernel: vmbr0: port 1(enp88s0) entered blocking state
Oct 22 03:01:40 nuc kernel: igc 0000:58:00.0 enp88s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Oct 22 03:00:18 nuc kernel: vmbr0: port 1(enp88s0) entered disabled state
Oct 22 03:00:18 nuc kernel: igc 0000:58:00.0 enp88s0: NIC Link is Down
Oct 22 02:17:01 nuc CRON[96248]: pam_unix(cron:session): session closed for user root
Oct 22 02:17:01 nuc CRON[96250]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
What I have done to try and fix it:
- Updated the BIOS to the latest version.
- Turned off various settings in the BIOS related to power settings, and onboard devices I dont use (wifi, BT).
- Changed the CPU settings for the VM, from type
host
to kvm64
, to the default x86-64-v2-AES
.- Switched from the stable to the newest 6.14 kernel.
- Limited the resource usage of my docker containers to below the VM's ram memory, just in case.
- Turned off VM ballooning to see if it makes a difference.
- Turned off either VM, still crashed in both instances.
- Let an external device monitor it with
dmesg --follow
and journalctl -f
, no extra information.- Ran a memory test, 2 passes, all good.
The system will crash at any time, but it seems to happen mostly when being idle, which it basically is doing most of the time.
It's not running out of memory, almost half the RAM is not provisioned to VM's or LXC's. The maximum CPU load is also ~50% when booting up.
I'm at a loss, hope someone has an idea.
Last edited: