Hosts in my Proxmox cluster are randomly rebooting.

berkaybulut

New Member
Feb 8, 2023
11
0
1
Hello,
I have a 4 machine proxmox cluster.

I use ceph as storage.

Some of the hosts in my cluster are randomly rebooting. Sometimes only 1, sometimes 4 randomly rebooting. I couldn't find much in the syslog logs. I need help.

All hosts use the same version.

proxmox-ve: 8.3.0 (running kernel: 6.8.12-8-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-7-pve-signed: 6.8.12-7
proxmox-kernel-6.8.12-6-pve-signed: 6.8.12-6
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.8.12-3-pve-signed: 6.8.12-3
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.8.8-3-pve-signed: 6.8.8-3
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.8-1-pve-signed: 6.8.8-1
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph: 18.2.4-pve3
ceph-fuse: 18.2.4-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1
 
also syslog
Feb 02 22:12:25 cmt6770 pmxcfs[1933]: [status] notice: received log
Feb 02 22:12:27 cmt6770 pmxcfs[1933]: [status] notice: received log
Feb 02 22:17:01 cmt6770 CRON[2095014]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 02 22:17:01 cmt6770 CRON[2095015]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Feb 02 22:17:01 cmt6770 CRON[2095014]: pam_unix(cron:session): session closed for user root
Feb 02 22:17:47 cmt6770 pvestatd[3616116]: status update time (6.593 seconds)
Feb 02 22:19:26 cmt6770 pveproxy[1955192]: Clearing outdated entries from certificate cache
Feb 02 22:19:30 cmt6770 pmxcfs[1933]: [status] notice: received log
Feb 02 22:19:47 cmt6770 pvestatd[3616116]: status update time (5.283 seconds)
Feb 02 22:23:01 cmt6770 pvedaemon[1023275]: writing cluster log failed: ipcc_send_rec[7] failed: Invalid argument
Feb 02 22:30:06 cmt6770 pvestatd[3616116]: status update time (5.456 seconds)
Feb 02 22:31:54 cmt6770 pvedaemon[1007837]: <root@pam> successful auth for user 'root@pam'
Feb 02 22:31:59 cmt6770 pvedaemon[1023275]: writing cluster log failed: ipcc_send_rec[7] failed: Invalid argument
Feb 02 22:32:37 cmt6770 pvedaemon[1023275]: writing cluster log failed: ipcc_send_rec[7] failed: Invalid argument
Feb 02 22:33:57 cmt6770 pvestatd[3616116]: status update time (5.391 seconds)
Feb 02 22:36:16 cmt6770 pvestatd[3616116]: status update time (5.102 seconds)
Feb 02 22:40:07 cmt6770 pvestatd[3616116]: status update time (5.491 seconds)
Feb 02 22:40:24 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: b2c74
Feb 02 22:41:06 cmt6770 pvestatd[3616116]: status update time (5.006 seconds)
Feb 02 22:42:20 cmt6770 pmxcfs[1933]: [status] notice: received log
Feb 02 22:42:21 cmt6770 pmxcfs[1933]: [status] notice: received log
Feb 02 22:43:06 cmt6770 pvestatd[3616116]: status update time (5.185 seconds)
Feb 02 22:44:19 cmt6770 pvedaemon[1023275]: writing cluster log failed: ipcc_send_rec[7] failed: Invalid argument
Feb 02 22:44:19 cmt6770 pvedaemon[956283]: <root@pam> successful auth for user 'root@pam'
Feb 02 22:45:09 cmt6770 pveproxy[2048403]: proxy detected vanished client connection
Feb 02 22:46:26 cmt6770 pvestatd[3616116]: status update time (5.007 seconds)
Feb 02 22:46:33 cmt6770 pvedaemon[1023275]: writing cluster log failed: ipcc_send_rec[7] failed: Invalid argument
Feb 02 22:47:56 cmt6770 pmxcfs[1933]: [dcdb] notice: data verification successful
Feb 02 22:48:25 cmt6770 pvestatd[3616116]: got timeout
Feb 02 22:48:27 cmt6770 pvestatd[3616116]: status update time (6.278 seconds)
Feb 02 22:48:35 cmt6770 pvestatd[3616116]: got timeout
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] link: host: 4 link: 0 is down
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] link: host: 4 link: 1 is down
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] host: host: 4 has no active links
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] host: host: 4 has no active links
Feb 02 22:48:40 cmt6770 corosync[1997]: [TOTEM ] Token has not been received in 7725 ms
Feb 02 22:48:40 cmt6770 pvestatd[3616116]: got timeout
Feb 02 22:48:42 cmt6770 corosync[1997]: [TOTEM ] A processor failed, forming new configuration: token timed out (10300ms), waiting 12360ms for consensus.
Feb 02 22:48:42 cmt6770 pvestatd[3616116]: status update time (11.307 seconds)
Feb 02 22:48:46 cmt6770 kernel: libceph: mon2 (1)10.0.10.9:6789 session established
Feb 02 22:48:48 cmt6770 corosync[1997]: [KNET ] link: host: 3 link: 0 is down
Feb 02 22:48:48 cmt6770 corosync[1997]: [KNET ] host: host: 3 (passive) best link: 1 (pri: 1)
Feb 02 22:48:54 cmt6770 corosync[1997]: [KNET ] rx: host: 3 link: 0 is up
Feb 02 22:48:54 cmt6770 corosync[1997]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 02 22:48:54 cmt6770 corosync[1997]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 02 22:48:54 cmt6770 corosync[1997]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 02 22:48:58 cmt6770 corosync[1997]: [QUORUM] Sync members[3]: 1 2 3
Feb 02 22:48:58 cmt6770 corosync[1997]: [QUORUM] Sync left[1]: 4
Feb 02 22:48:58 cmt6770 corosync[1997]: [TOTEM ] A new membership (1.7fbe) was formed. Members left: 4
Feb 02 22:48:58 cmt6770 corosync[1997]: [TOTEM ] Failed to receive the leave message. failed: 4
Feb 02 22:49:02 cmt6770 corosync[1997]: [KNET ] link: host: 3 link: 0 is down
Feb 02 22:49:02 cmt6770 corosync[1997]: [KNET ] host: host: 3 (passive) best link: 1 (pri: 1)
Feb 02 22:49:04 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 10
Feb 02 22:49:05 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 20
Feb 02 22:49:06 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: 3
Feb 02 22:49:06 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: 4 5
Feb 02 22:49:06 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 30
Feb 02 22:49:06 cmt6770 pmxcfs[1933]: [dcdb] notice: members: 1/1933, 2/2572, 3/3196
Feb 02 22:49:06 cmt6770 pmxcfs[1933]: [dcdb] notice: starting data syncronisation
Feb 02 22:49:07 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 40
Feb 02 22:49:07 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 10
Feb 02 22:49:08 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 50
Feb 02 22:49:08 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 20
Feb 02 22:49:09 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 60
Feb 02 22:49:09 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 30
Feb 02 22:49:10 cmt6770 corosync[1997]: [KNET ] rx: host: 3 link: 0 is up
Feb 02 22:49:10 cmt6770 corosync[1997]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 02 22:49:10 cmt6770 corosync[1997]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 02 22:49:10 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 70
Feb 02 22:49:10 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 40
Feb 02 22:49:10 cmt6770 corosync[1997]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 02 22:49:11 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 80
Feb 02 22:49:11 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retry 50
Feb 02 22:49:11 cmt6770 corosync[1997]: [QUORUM] Members[3]: 1 2 3
Feb 02 22:49:11 cmt6770 corosync[1997]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 02 22:49:11 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retried 83 times
Feb 02 22:49:11 cmt6770 pmxcfs[1933]: [dcdb] notice: cpg_send_message retried 53 times
Feb 02 22:49:11 cmt6770 pmxcfs[1933]: [status] notice: members: 1/1933, 2/2572, 3/3196
Feb 02 22:49:11 cmt6770 pmxcfs[1933]: [status] notice: starting data syncronisation
Feb 02 22:49:23 cmt6770 corosync[1997]: [KNET ] link: host: 3 link: 0 is down
Feb 02 22:49:23 cmt6770 corosync[1997]: [KNET ] host: host: 3 (passive) best link: 1 (pri: 1)
Feb 02 22:49:24 cmt6770 ceph-osd[2792]: 2025-02-02T22:49:24.091+0300 70eb8b6006c0 -1 osd.2 41888 get_health_metrics reporting 3 slow ops, oldest is osd_op(client.18334602.0:649374 7.77 7:ee414baa:::rbd_data.83d8ca5619e018.0000000000001300:head [write 360448~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41882)
Feb 02 22:49:24 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: e 10 11 12 13 14 15 17
Feb 02 22:49:24 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: 10 11 12 13 14 15 17 19 1a
Feb 02 22:49:24 cmt6770 watchdog-mux[1630]: client watchdog expired - disable watchdog updates
ondisk+write+known_if_redirected+supports_pool_eio e41884)
Feb 02 22:49:28 cmt6770 corosync[1997]: [KNET ] link: host: 3 link: 1 is down
Feb 02 22:49:28 cmt6770 corosync[1997]: [KNET ] host: host: 3 (passive) best link: 1 (pri: 1)
Feb 02 22:49:28 cmt6770 corosync[1997]: [KNET ] host: host: 3 has no active links
Feb 02 22:49:29 cmt6770 corosync[1997]: [KNET ] link: Resetting MTU for link 1 because host 3 joined
Feb 02 22:49:29 cmt6770 corosync[1997]: [KNET ] host: host: 3 (passive) best link: 1 (pri: 1)
Feb 02 22:49:29 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: 11 12 13 17 19 1a 1b 1c 1d 20 21 22 23 24 25 26 27
Feb 02 22:49:29 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: 11 12 13 17 19 1b 1d 20 22 23 24 25 26 27
Feb 02 22:49:29 cmt6770 pmxcfs[1933]: [dcdb] notice: received all states
Feb 02 22:49:29 cmt6770 pmxcfs[1933]: [dcdb] notice: leader is 1/1933
Feb 02 22:49:29 cmt6770 pmxcfs[1933]: [dcdb] notice: synced members: 1/1933, 2/2572, 3/3196
Feb 02 22:49:29 cmt6770 pmxcfs[1933]: [dcdb] notice: start sending inode updates
Feb 02 22:49:29 cmt6770 pmxcfs[1933]: [dcdb] notice: sent all (0) updates
Feb 02 22:49:29 cmt6770 pmxcfs[1933]: [dcdb] notice: all data is up to date
Feb 02 22:49:29 cmt6770 pmxcfs[1933]: [dcdb] notice: dfsm_deliver_queue: queue length 11
Feb 02 22:49:29 cmt6770 watchdog-mux[1630]: exit watchdog-mux with active connections
Feb 02 22:49:29 cmt6770 pmxcfs[1933]: [status] notice: received all states
Feb 02 22:49:29 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: 19 1b 1d 20 22 23 26
Feb 02 22:49:29 cmt6770 pvescheduler[2122414]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
Feb 02 22:49:29 cmt6770 pmxcfs[1933]: [status] notice: all data is up to date
Feb 02 22:49:29 cmt6770 pmxcfs[1933]: [status] notice: dfsm_deliver_queue: queue length 1889
Feb 02 22:49:29 cmt6770 systemd-journald[1033]: Received client request to sync journal.
Feb 02 22:49:29 cmt6770 kernel: watchdog: watchdog0: watchdog did not stop!
Feb 02 22:49:29 cmt6770 systemd[1]: watchdog-mux.service: Deactivated successfully.
Feb 02 22:49:29 cmt6770 systemd[1]: watchdog-mux.service: Consumed 41.545s CPU time.
Feb 02 22:49:29 cmt6770 pvescheduler[2122415]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Feb 02 22:49:30 cmt6770 ceph-osd[2792]: 2025-02-02T22:49:30.117+0300 70eb8b6006c0 -1 osd.2 41888 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.18268111.0:214731 7.e 7:705055a7:::rbd_data.e3ee491242a87e.0000000000000266:head [write 1777664~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41882)
Feb 02 22:49:31 cmt6770 ceph-osd[2792]: 2025-02-02T22:49:31.086+0300 70eb8b6006c0 -1 osd.2 41888 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.18268111.0:214731 7.e 7:705055a7:::rbd_data.e3ee491242a87e.0000000000000266:head [write 1777664~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41882)
Feb 02 22:49:31 cmt6770 ceph-osd[2791]: 2025-02-02T22:49:31.669+0300 7a3eb02006c0 -1 osd.18 41888 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.17449834.0:1181107 7.0 7:00003b05:::rbd_data.0a16eb7300114b.00000000000003e0:head [write 2023424~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41887)
Feb 02 22:49:32 cmt6770 ceph-osd[2792]: 2025-02-02T22:49:32.080+0300 70eb8b6006c0 -1 osd.2 41888 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.18268111.0:214731 7.e 7:705055a7:::rbd_data.e3ee491242a87e.0000000000000266:head [write 1777664~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41882)
Feb 02 22:49:32 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: 1d 20 22 23 26
Feb 02 22:49:32 cmt6770 pve-ha-lrm[3789583]: loop take too long (61 seconds)
Feb 02 22:49:32 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: 1d 20 22 23 26 32 33
Feb 02 22:49:32 cmt6770 pve-ha-lrm[3789583]: watchdog update failed - Broken pipe
Feb 02 22:49:32 cmt6770 ceph-osd[2791]: 2025-02-02T22:49:32.692+0300 7a3eb02006c0 -1 osd.18 41888 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.17449834.0:1181107 7.0 7:00003b05:::rbd_data.0a16eb7300114b.00000000000003e0:head [write 2023424~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41887)
Feb 02 22:49:33 cmt6770 ceph-osd[2791]: 2025-02-02T22:49:33.666+0300 7a3eb02006c0 -1 osd.18 41888 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.17449834.0:1181107 7.0 7:00003b05:::rbd_data.0a16eb7300114b.00000000000003e0:head [write 2023424~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41887)
Feb 02 22:49:33 cmt6770 corosync[1997]: [KNET ] rx: host: 3 link: 0 is up
Feb 02 22:49:33 cmt6770 corosync[1997]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 02 22:49:33 cmt6770 corosync[1997]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 02 22:49:34 cmt6770 ceph-osd[2792]: 2025-02-02T22:49:34.120+0300 70eb8b6006c0 -1 osd.2 41888 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.18039030.0:938471 7.16f 7:f6cb3700:::rbd_data.ed023169b04d88.00000000000014f3:head [write 3670016~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41882)
Feb 02 22:49:34 cmt6770 ceph-osd[2791]: 2025-02-02T22:49:34.684+0300 7a3eb02006c0 -1 osd.18 41888 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.17449834.0:1181107 7.0 7:00003b05:::rbd_data.0a16eb7300114b.00000000000003e0:head [write 2023424~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41887)
Feb 02 22:49:35 cmt6770 ceph-osd[2792]: 2025-02-02T22:49:35.087+0300 70eb8b6006c0 -1 osd.2 41888 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.18039030.0:938471 7.16f 7:f6cb3700:::rbd_data.ed023169b04d88.00000000000014f3:head [write 3670016~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41882)
Feb 02 22:49:35 cmt6770 corosync[1997]: [TOTEM ] Retransmit List: 22 23 26 32 33
Feb 02 22:49:35 cmt6770 ceph-osd[2791]: 2025-02-02T22:49:35.729+0300 7a3eb02006c0 -1 osd.18 41888 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.17449834.0:1181107 7.0 7:00003b05:::rbd_data.0a16eb7300114b.00000000000003e0:head [write 2023424~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41887)
Feb 02 22:49:36 cmt6770 ceph-osd[2792]: 2025-02-02T22:49:36.055+0300 70eb8b6006c0 -1 osd.2 41888 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.18039030.0:938471 7.16f 7:f6cb3700:::rbd_data.ed023169b04d88.00000000000014f3:head [write 3670016~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e41882)
-- Reboot --
Feb 02 22:51:24 cmt6770 kernel: Linux version 6.8.12-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-8 (2025-01-24T12:32Z) ()
Feb 02 22:51:24 cmt6770 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-8-pve root=/dev/mapper/pve-root ro quiet
 
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] link: host: 4 link: 0 is down
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] link: host: 4 link: 1 is down
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] host: host: 4 has no active links
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Feb 02 22:48:35 cmt6770 corosync[1997]: [KNET ] host: host: 4 has no active links
Feb 02 22:49:28 cmt6770 corosync[1997]: [KNET ] link: host: 3 link: 1 is down
Feb 02 22:49:28 cmt6770 corosync[1997]: [KNET ] host: host: 3 (passive) best link: 1 (pri: 1)
Feb 02 22:49:28 cmt6770 corosync[1997]: [KNET ] host: host: 3 has no active links
Feb 02 22:49:32 cmt6770 pve-ha-lrm[3789583]: loop take too long (61 seconds)
Maybe network, cable, switch problem and so the nodes are fencing (=rebooting) itself in the hope it would resolve the problem (by reinit all) which is expected bahavior.