Hello, I'm new to linux and proxmox and I'm having troubling keeping my cluster online due to some nodes freezing.
I have a proxmox cluster of 3 identical mini pc nodes and one big node.
My big node is super stable but my minipcs keep going offline and freezing after a period of like 5 days independently. When i try to kvm into them i get no signal from monitor.
the Mini pc is a Lenovo ThinkCentre M710q with brand new storage and ram.
This is what it says for system logs. I really don't know why it keeps turning off.
I also noticed theres a lot of issues with corosync
I have a proxmox cluster of 3 identical mini pc nodes and one big node.
My big node is super stable but my minipcs keep going offline and freezing after a period of like 5 days independently. When i try to kvm into them i get no signal from monitor.
the Mini pc is a Lenovo ThinkCentre M710q with brand new storage and ram.
This is what it says for system logs. I really don't know why it keeps turning off.
Code:
Nov 06 04:02:42 servus-3 smartd[889]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 56 to 55
Nov 06 04:17:01 servus-3 CRON[1213556]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 06 04:17:01 servus-3 CRON[1213558]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 06 04:17:01 servus-3 CRON[1213556]: pam_unix(cron:session): session closed for user root
Nov 06 04:32:42 servus-3 smartd[889]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 55 to 56
Nov 06 04:33:04 servus-3 pmxcfs[1092]: [dcdb] notice: data verification successful
Nov 06 04:54:43 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 06 04:54:43 servus-3 corosync[1238]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 06 04:54:43 servus-3 corosync[1238]: [KNET ] host: host: 2 has no active links
Nov 06 04:54:44 servus-3 corosync[1238]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
Nov 06 04:54:44 servus-3 corosync[1238]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 06 04:54:44 servus-3 corosync[1238]: [KNET ] pmtud: Global data MTU changed to: 1397
Nov 06 05:00:47 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 06 05:00:47 servus-3 corosync[1238]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 06 05:00:47 servus-3 corosync[1238]: [KNET ] host: host: 2 has no active links
Nov 06 05:00:48 servus-3 corosync[1238]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
Nov 06 05:00:48 servus-3 corosync[1238]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 06 05:00:48 servus-3 corosync[1238]: [KNET ] pmtud: Global data MTU changed to: 1397
Nov 06 05:02:42 servus-3 smartd[889]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 56 to 55
Nov 06 05:17:01 servus-3 CRON[1223341]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 06 05:17:01 servus-3 CRON[1223343]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 06 05:17:01 servus-3 CRON[1223341]: pam_unix(cron:session): session closed for user root
Nov 06 05:32:42 servus-3 smartd[889]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 55 to 56
Nov 06 05:33:04 servus-3 pmxcfs[1092]: [dcdb] notice: data verification successful
Nov 06 06:02:42 servus-3 smartd[889]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 56 to 57
Nov 06 06:15:18 servus-3 systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Nov 06 06:15:18 servus-3 systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Nov 06 06:15:18 servus-3 systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Nov 06 06:17:01 servus-3 CRON[1233175]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 06 06:17:01 servus-3 CRON[1233177]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 06 06:17:01 servus-3 CRON[1233175]: pam_unix(cron:session): session closed for user root
Nov 06 06:25:01 servus-3 CRON[1234484]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 06 06:25:01 servus-3 CRON[1234486]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Nov 06 06:25:01 servus-3 CRON[1234484]: pam_unix(cron:session): session closed for user root
Nov 06 06:33:04 servus-3 pmxcfs[1092]: [dcdb] notice: data verification successful
Nov 06 07:17:01 servus-3 CRON[1242970]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 06 07:17:01 servus-3 CRON[1242972]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 06 07:17:01 servus-3 CRON[1242970]: pam_unix(cron:session): session closed for user root
Nov 06 07:32:42 servus-3 smartd[889]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 57 to 56
Nov 06 07:33:04 servus-3 pmxcfs[1092]: [dcdb] notice: data verification successful
Nov 06 08:02:42 servus-3 smartd[889]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 56 to 57
Nov 06 08:17:01 servus-3 CRON[1252755]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 06 08:17:01 servus-3 CRON[1252757]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 06 08:17:01 servus-3 CRON[1252755]: pam_unix(cron:session): session closed for user root
Nov 06 08:32:42 servus-3 smartd[889]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 57 to 58
Nov 06 08:33:04 servus-3 pmxcfs[1092]: [dcdb] notice: data verification successful
Nov 06 09:02:42 servus-3 smartd[889]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 58 to 56
Nov 06 09:17:01 servus-3 CRON[1262543]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Nov 06 09:17:01 servus-3 CRON[1262545]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 06 09:17:01 servus-3 CRON[1262543]: pam_unix(cron:session): session closed for user root
Nov 06 09:33:04 servus-3 pmxcfs[1092]: [dcdb] notice: data verification successful
-- Reboot --
Nov 17 15:52:22 servus-3 kernel: Linux version 6.14.11-4-pve (build@proxmox) (gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC PMX 6.14.11-4 (2025-10-10T08:04Z) ()
Nov 17 15:52:22 servus-3 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.11-4-pve root=/dev/mapper/pve-root ro quiet
Nov 17 15:52:22 servus-3 kernel: KERNEL supported cpus:
Nov 17 15:52:22 servus-3 kernel: Intel GenuineIntel
Nov 17 15:52:22 servus-3 kernel: AMD AuthenticAMD
Nov 17 15:52:22 servus-3 kernel: Hygon HygonGenuine
Nov 17 15:52:22 servus-3 kernel: Centaur CentaurHauls
Nov 17 15:52:22 servus-3 kernel: zhaoxin Shanghai
Nov 17 15:52:22 servus-3 kernel: BIOS-provided physical RAM map:
Nov 17 15:52:22 servus-3 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
Nov 17 15:52:22 servus-3 kernel: BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
Nov 17 15:52:22 servus-3 kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000003ffffff] usable
Nov 17 15:52:22 servus-3 kernel: BIOS-e820: [mem 0x0000000004000000-0x0000000004009fff] ACPI NVS
Nov 17 15:52:22 servus-3 kernel: BIOS-e820: [mem 0x000000000400a000-0x0000000009cfffff] usable
Nov 17 15:52:22 servus-3 kernel: BIOS-e820: [mem 0x0000000009d00000-0x0000000009ffffff] reserved
I also noticed theres a lot of issues with corosync
Code:
journalctl -b -1 | grep "KNET.*down\|has no active links" | wc -l
41
journalctl -b -1 | grep "KNET.*link.*down"
Nov 01 21:02:25 servus-3 corosync[1238]: [KNET ] link: host: 1 link: 0 is down
Nov 02 03:32:26 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 02 11:30:16 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 02 12:34:32 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 03 12:23:17 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 04 02:12:32 servus-3 corosync[1238]: [KNET ] link: host: 4 link: 0 is down
Nov 04 20:28:30 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 04 20:44:18 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 05 16:24:39 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 05 18:02:17 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 05 18:26:51 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 05 19:15:15 servus-3 corosync[1238]: [KNET ] link: host: 4 link: 0 is down
Nov 05 23:09:39 servus-3 corosync[1238]: [KNET ] link: host: 4 link: 0 is down
Nov 06 02:50:32 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 06 04:54:43 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Nov 06 05:00:47 servus-3 corosync[1238]: [KNET ] link: host: 2 link: 0 is down
Last edited: