Hello we have a client with a 3 node ceph cluster who's servers have rebooted all at the same time. Other proxmox clusters in the rack were unaffected, don't think it's a switch issue. We're running the latest proxmox 6.2, corosync v3, Libknet is v1.16. We believe that the nodes were all fenced at the same time due to some networking blip. Any thoughts?
Code:
Aug 5 20:46:44 proxmox1 corosync[1972]: [KNET ] rx: host: 3 link: 0 is up
Aug 5 20:46:44 proxmox1 corosync[1972]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 5 20:46:44 proxmox1 corosync[1972]: [TOTEM ] A new membership (1.d66) was formed. Members
Aug 5 20:46:45 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 80
Aug 5 20:46:46 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 90
Aug 5 20:46:46 proxmox1 corosync[1972]: [TOTEM ] A new membership (1.d6a) was formed. Members
Aug 5 20:46:47 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 100
Aug 5 20:46:47 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retried 100 times
Aug 5 20:46:47 proxmox1 pmxcfs[1856]: [status] crit: cpg_send_message failed: 6
Aug 5 20:46:48 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 10
Aug 5 20:46:48 proxmox1 corosync[1972]: [TOTEM ] A new membership (1.d6e) was formed. Members
Aug 5 20:46:49 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 20
Aug 5 20:46:50 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 30
Aug 5 20:46:50 proxmox1 corosync[1972]: [TOTEM ] A new membership (1.d72) was formed. Members
Aug 5 20:46:51 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 40
Aug 5 20:46:52 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 50
Aug 5 20:46:52 proxmox1 corosync[1972]: [TOTEM ] A new membership (1.d76) was formed. Members
Aug 5 20:46:53 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 60
Aug 5 20:46:54 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 70
Aug 5 20:46:54 proxmox1 corosync[1972]: [TOTEM ] A new membership (1.d7a) was formed. Members
Aug 5 20:46:55 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 80
Aug 5 20:46:56 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 90
Aug 5 20:46:56 proxmox1 corosync[1972]: [TOTEM ] A new membership (1.d7e) was formed. Members
Aug 5 20:46:57 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 100
Aug 5 20:46:57 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retried 100 times
Aug 5 20:46:57 proxmox1 pmxcfs[1856]: [status] crit: cpg_send_message failed: 6
Aug 5 20:46:57 proxmox1 pve-firewall[2078]: firewall update time (15.972 seconds)
Aug 5 20:46:58 proxmox1 pmxcfs[1856]: [status] notice: cpg_send_message retry 10
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^