Cluster Crashed - no quorum

ejc317

Member
Oct 18, 2012
263
0
16
So after getting almost 90% of the system working ... one of the nodes shutdown and it didnt take the new cluster.conf from the master. Now after reboots - the nodes are not seeing each other and are all broken ... this seem sa bit fragile for a potential production system

All we did was add fencing - is there anyway to recreate the cluster or do we have to manually reinstall everything?

cman on the nodes wont start b/c it says there's an error in cluster.conf (which there is but I can't edit it)
 
It literally was fine 5 minutes ago now none of the nodes will start

We have our own switches and this is on aprivate vlan - IP MUlticast is set to on for the vlan ...
 
So ccman won't start because the config says it needs cman ... although the nodes just crashed and ccman cannot restart because it was in the midst of updating a cluster config and we can''t salvage it so we're re-installing

This seems very dangerous for software for a full hosting production environment ... anyone have any ideas? None of the ccman will start so it won't poll for other nodes
 
This seems very dangerous for software for a full hosting production environment ... anyone have any ideas? None of the ccman will start so it won't poll for other nodes

Yes, it is dangerous to write wrong things into the cluster config. The question is who has done that?
 
BTW, I simply can't answer all your questions here (more that 20 posts within 24 hours - that is a full time job). You should go for a commercial support subscription instead.
 
It wasn't "wrong" per se - i think it was in the middle of writing when one node crashed. so the right config was still in cluster.conf.new

but ccman won't start up and the config was calling on ccman so it was looping
 
It wasn't "wrong" per se - i think it was in the middle of writing when one node crashed. so the right config was still in cluster.conf.new

If you only modify cluster.conf.new nothing happens. Things can only go wrong if you modify cluster.conf directly?
 
That's what I thought too.

For some reason, here's what happened.

1) Node 4 goes down while trying to migrate KVM VM
2) I reboot nodes 1-3 as they were getting slow
3) There was a cluster.conf that hadn't propogated it.
4) Upon reboot, the cluster.conf was not being read properly because CMAN wasn't started and CMAN was still in the cluster conf so it was looping. (The revised cluster.conf was still in cluster.conf.new)

Only way I saw to fix it was to take out the bit re: cman and corosync key but that breaks the cluster
 
3) There was a cluster.conf that hadn't propogated it.

Because it contains errors. Any you rebooted all node at this time, so you lost quorom also.

You should never directly edit /etc/pve/cluster.conf. Instead, edit /etc/pve/cluster.conf.new, verify that the changes are OK with ccs_config_validate, increase config version, and then use the GUI to commit changes.

To correct this mess, copy a corrected cluster.conf file to /etc/cluster/cluster.conf (to all nodes). Then you should be able to restart cman and correct the mistake in /etc/pve/cluster.conf.
 
Yep - cluster.conf was locked - permission denied. We put it all in local mode. Issue was it was calling cman corosync key at top and cman wasn't started - should we take this out?

either case, we already reformmated and cluster is up and working OK - but good to know in future.

On a side note, just to clarify for our four nodes, we need to buy 4 of the 2 socket packages right? (since we're only testing, I can do with only 2 nodes really)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!