Gooda day as per below screenshot.
Im running 8.3.5
This has happened twice now.
This morning i needed to restart host p8.
When i did that the server did not come online in the cluster.
It originaally had a RED X at its slot.
Then on all nodes went into this state.
I cant manage the machines.
I only rebooted one of the 9 nodes.
Last time it was when i rebooted node p3.
The only way i cant get eveything online is to reboot all the nodes at the same time.
But then if i reboot one later this happens again.
This started after i upgradeed one node to 8.3.5
So i then upgraded all nodes to 8.3.5.
EDIT: When this happens the actual VM's stay online.
I am running the vm network on its own bonded interfaces.
The ceph and management networks run on there own fibre network interfaces.
A didcated fibre 10GBPS For each
I use ceph and zfs on the hosts.
Doea anyone know
a) why this is happening.
b) Howto fix it without rebooting the entire cluster.

EDIT: I rand systemctl status and this is what i got.
Im running 8.3.5
This has happened twice now.
This morning i needed to restart host p8.
When i did that the server did not come online in the cluster.
It originaally had a RED X at its slot.
Then on all nodes went into this state.
I cant manage the machines.
I only rebooted one of the 9 nodes.
Last time it was when i rebooted node p3.
The only way i cant get eveything online is to reboot all the nodes at the same time.
But then if i reboot one later this happens again.
This started after i upgradeed one node to 8.3.5
So i then upgraded all nodes to 8.3.5.
EDIT: When this happens the actual VM's stay online.
I am running the vm network on its own bonded interfaces.
The ceph and management networks run on there own fibre network interfaces.
A didcated fibre 10GBPS For each
I use ceph and zfs on the hosts.
Doea anyone know
a) why this is happening.
b) Howto fix it without rebooting the entire cluster.

EDIT: I rand systemctl status and this is what i got.
Code:
root@atsho2p8:/etc/pve/qemu-server# systemctl status pve-cluster corosync
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
Active: active (running) since Tue 2025-04-22 09:28:06 SAST; 2h 15min ago
Process: 1623 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 1626 (pmxcfs)
Tasks: 10 (limit: 232010)
Memory: 68.7M
CPU: 8.556s
CGroup: /system.slice/pve-cluster.service
└─1626 /usr/bin/pmxcfs
Apr 22 11:43:33 atsho2p8 pmxcfs[1626]: [status] notice: cpg_send_message retry 80
Apr 22 11:43:34 atsho2p8 pmxcfs[1626]: [status] notice: cpg_send_message retry 90
Apr 22 11:43:35 atsho2p8 pmxcfs[1626]: [status] notice: cpg_send_message retry 100
Apr 22 11:43:35 atsho2p8 pmxcfs[1626]: [status] notice: cpg_send_message retried 100 times
Apr 22 11:43:35 atsho2p8 pmxcfs[1626]: [status] crit: cpg_send_message failed: 6
Apr 22 11:43:36 atsho2p8 pmxcfs[1626]: [status] notice: cpg_send_message retry 10
Apr 22 11:43:37 atsho2p8 pmxcfs[1626]: [status] notice: cpg_send_message retry 20
Apr 22 11:43:38 atsho2p8 pmxcfs[1626]: [status] notice: cpg_send_message retry 30
Apr 22 11:43:39 atsho2p8 pmxcfs[1626]: [status] notice: cpg_send_message retry 40
Apr 22 11:43:40 atsho2p8 pmxcfs[1626]: [status] notice: cpg_send_message retry 50
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
Active: active (running) since Tue 2025-04-22 09:28:07 SAST; 2h 15min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 1691 (corosync)
Tasks: 9 (limit: 232010)
Memory: 3.9G
CPU: 1h 53min 5.992s
CGroup: /system.slice/corosync.service
└─1691 /usr/sbin/corosync -f
Apr 22 11:41:57 atsho2p8 corosync[1691]: [TOTEM ] Retransmit List: e f 11 20 2e 2f 30 31 32 1f 43 49 4a
Apr 22 11:42:01 atsho2p8 corosync[1691]: [TOTEM ] Retransmit List: 32 58 5e 6d 72 73 74 75
Apr 22 11:42:06 atsho2p8 corosync[1691]: [TOTEM ] Token has not been received in 5662 ms
Apr 22 11:42:39 atsho2p8 corosync[1691]: [TOTEM ] Retransmit List: 6 7 8 9 b c d e f 10 11 1a 1b 1c 1d 1e 1f 20 2>
Apr 22 11:42:40 atsho2p8 corosync[1691]: [TOTEM ] Retransmit List: b d e f 11 20 2f 30 31 1f 43 49 4a
Apr 22 11:42:45 atsho2p8 corosync[1691]: [TOTEM ] Token has not been received in 5663 ms
Apr 22 11:42:47 atsho2p8 corosync[1691]: [TOTEM ] Retransmit List: f 11 58 5e 6d 72 73 74 75
Apr 22 11:42:52 atsho2p8 corosync[1691]: [TOTEM ] Retransmit List: 82 83 89
Apr 22 11:42:58 atsho2p8 corosync[1691]: [TOTEM ] Retransmit List: b5 b4
Apr 22 11:43:19 atsho2p8 corosync[1691]: [TOTEM ] Token has not been received in 5662 ms
Last edited: