pmxcfs died in 6 machine cluster

thesubmitter

Active Member
Jun 5, 2012
64
14
28
I removed 1 machine from the clsuter and and now all the otehr machines are logging cp_send_message failed

I also tried pvecm e 4

no effect?

I am using different versions of proxmox on the machines
3 are 2.3
2 are 3.0
I remove one of the 3.0 ones.
 
did /etc/init.d/pve-cluster restart
pvecm e 1
pmxcfs --local

I can login on the WEB UI now BUT if i try to remove that old node I get " cluster not ready no quorum"

Even though all the machines shwo:

Version: 6.2.0
Config Version: 20
Cluster Name: luster
Cluster Id: 34356
Cluster Member: Yes
Cluster Generation: 29860
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 5
Node votes: 1
Quorum: 3
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: aname
Node ID: 3
Multicast addresses: 239.192.134.186
Node addresses: 192.168.xxx.xx
 
also getting messages like:

ep 26 18:28:17 xxxx pmxcfs[954879]: [dcdb] notice: cpg_join retry 6120
Sep 26 18:28:18 xxx pmxcfs[954879]: [dcdb] notice: cpg_join retry 6130
Sep 26 18:28:19 xxx pmxcfs[954879]: [dcdb] notice: cpg_join retry 6140
Sep 26 18:28:20 xxx pmxcfs[954879]: [status] crit: cpg_send_message failed: 9
Sep 26 18:28:20 xxx pmxcfs[954879]: [status] crit: cpg_send_message failed: 9
 
I actually fixed this!
/etc/init.d/pve-cluster stop
On one of the machines I KILLED all the processes of cman (dlm_Controld and fenced) the others shutdown with /etc/init.d/cman stop
At that point all the other machines got really happy and came back online and synced!!!!

That machine needed a cman start and pve-cluster start and it was back too!

that was close!!!!