Urgent Help Needed

ejc317

Member
Oct 18, 2012
263
0
16
So we tried to remove a node from the cluster. This is node 1 out of 12.

After it removed, we restarted the cluster. Now there is no quorum. Node 1 sees itself with cman and rgmanager start. Nodes 2 - 12 cannot start cman and there's no data (all the qemu-server folders are empty!!!)

I checked our SAN and the storage data is still there meaning the confs have been erased (not as bad)

I tried to start cman and I get corosync daemon cannot be started on nodes 2-12 (node 1 starts)

I can manually edit custer.conf on node 1 but that's it

any help is appreciated ...
 
All the files are still read only ...

when I try to put it into local mode it says cannot get lock (pmxcfs -l). I tried unmount and remounting also doesn't work

HELP!!!!!!!
 
I have faced the same issue last week and I know how utterly frustrating it is. All I did was safely remove a node from a cluster of 4 nodes using pvecm delnode. That worked but the I noticed that all my nodes had lost quorum, and couldn't figure it out for the life of me. Now what I did notice was that when I accessed each node individually they showed up as green with the remaining nodes showing as red. I should have thoroughly documented this case and reported it to Tom, but haven't had the time yet. Anyway, here are some of the items that I tried while trying to bring back each node:


pvecm status (take note of the expected votes)
pvecm e 1 (Set votes to 1)
pvecm status (check to see if that worked)



/etc/init.d/pve-cluster stop

/usr/bin/pmxcfs -l

mv /etc/pve/cluster.conf ~/

/etc/init.d/pve-cluster stop

/etc/init.d/cman stop

/etc/init.d/cman start

/etc/init.d/pve-cluster start

Check your logs:

tail -f /var/log/syslog
tail -f /var/log/messages

Check Cluster Status:
pvecm nodes
pvecm status

On the node we just ran all this on, check to see if your member has been populated:

cat /etc/pve/.members



Finally, restart everything for good measure:

/etc/init.d/pvestatd restart
/etc/init.d/pvedaemon restart
/etc/init.d/apache2 restart


Hopefully that helps as I'm just stepping through my bash_history
 
Please use a subject which describes your issue. Otherwise others can not really benefit from your experiences, issues and possible solutions.