3-node HA Cluster, nodes fenced without lost quorum

Jarek

Well-Known Member
Dec 16, 2016
74
11
48
46
Hi.
All nodes with bond interfaces, first eth in bond connected to master switch, second - to slave switch.
When power back on slave switch:

Dec 16 08:36:16 p1 kernel: [2926029.077939] bnx2 0000:01:00.1 eth1: NIC Copper Link is Up, 1000 Mbps full duplex
Dec 16 08:36:16 p1 kernel: [2926029.077949]
Dec 16 08:36:16 p1 kernel: [2926029.190619] bnx2 0000:02:00.0 eth2: NIC Copper Link is Up, 1000 Mbps full duplex
Dec 16 08:36:16 p1 kernel: [2926029.190626]
Dec 16 08:36:19 p1 kernel: [2926032.227493] e1000e: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Dec 16 08:36:22 p1 corosync[2164]: [TOTEM ] A new membership (10.0.0.11:592) was formed. Members
Dec 16 08:36:22 p1 corosync[2164]: [QUORUM] Members[3]: 3 1 2
Dec 16 08:36:22 p1 corosync[2164]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 16 08:36:24 p1 corosync[2164]: [TOTEM ] A processor failed, forming new configuration.
Dec 16 08:36:26 p1 corosync[2164]: [TOTEM ] A new membership (10.0.0.21:596) was formed. Members left: 3
Dec 16 08:36:26 p1 corosync[2164]: [TOTEM ] Failed to receive the leave message. failed: 3
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: members: 1/2110, 2/2262
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: starting data syncronisation
Dec 16 08:36:26 p1 corosync[2164]: [QUORUM] Members[2]: 1 2
Dec 16 08:36:26 p1 corosync[2164]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: cpg_send_message retried 1 times
Dec 16 08:36:26 p1 pmxcfs[2110]: [status] notice: members: 1/2110, 2/2262
Dec 16 08:36:26 p1 pmxcfs[2110]: [status] notice: starting data syncronisation
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: received sync request (epoch 1/2110/00000008)
Dec 16 08:36:26 p1 pmxcfs[2110]: [status] notice: received sync request (epoch 1/2110/00000006)
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: received all states
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: leader is 1/2110
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: synced members: 1/2110, 2/2262
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: start sending inode updates
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: sent all (0) updates
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: all data is up to date
Dec 16 08:36:26 p1 pmxcfs[2110]: [dcdb] notice: dfsm_deliver_queue: queue length 3
Dec 16 08:36:26 p1 pmxcfs[2110]: [status] notice: received all states
Dec 16 08:36:26 p1 pmxcfs[2110]: [status] notice: all data is up to date
Dec 16 08:36:26 p1 pmxcfs[2110]: [status] notice: dfsm_deliver_queue: queue length 20
Dec 16 08:36:30 p1 corosync[2164]: [TOTEM ] A new membership (10.0.0.21:600) was formed. Members
Dec 16 08:36:30 p1 corosync[2164]: [QUORUM] Members[2]: 1 2
Dec 16 08:36:30 p1 corosync[2164]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 16 08:36:32 p1 corosync[2164]: [TOTEM ] A new membership (10.0.0.11:612) was formed. Members joined: 3
Dec 16 08:36:32 p1 pmxcfs[2110]: [dcdb] notice: members: 1/2110, 2/2262, 3/1331
Dec 16 08:36:32 p1 pmxcfs[2110]: [dcdb] notice: starting data syncronisation
Dec 16 08:36:32 p1 corosync[2164]: [QUORUM] Members[3]: 3 1 2
Dec 16 08:36:32 p1 corosync[2164]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 16 08:36:32 p1 pmxcfs[2110]: [dcdb] notice: cpg_send_message retried 1 times
Dec 16 08:36:32 p1 pmxcfs[2110]: [status] notice: members: 1/2110, 2/2262, 3/1331
Dec 16 08:36:32 p1 pmxcfs[2110]: [status] notice: starting data syncronisation
Dec 16 08:36:32 p1 pmxcfs[2110]: [dcdb] notice: received sync request (epoch 1/2110/00000009)
Dec 16 08:36:32 p1 pmxcfs[2110]: [status] notice: received sync request (epoch 1/2110/00000007)
Dec 16 08:37:31 p1 watchdog-mux[1961]: client watchdog expired - disable watchdog updates

... and reboot.
Why nodes were fenced, if there was not quorum lost?
 
Full syslog attached.
About hour before crash:
Dec 16 07:48:28 p1 pmxcfs[2110]: [dcdb] notice: data verification successful
 

Attachments

  • syslog.txt
    136.1 KB · Views: 2

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!