How to reset cluster on 2.0 RC.

francoisd

Renowned Member
Sep 10, 2009
55
6
73
My cluster configuration is not working (. I would like to reset it. How can I do ?

P.S. : Reinstalling is not an option as I have a long setup on DRBD, and I already reinstalled twice.
Just for info, my problem is :
"cman_tool: corosync daemon didn't start Check cluster logs for details" when I restart cman, but corosync.log is 0 bytes.

Thanks,
 
Quite similar on both nodes :
root@proxmox1:~# pvecm status
cman_tool: Cannot open connection to cman, is it running ?

And a tail on syslog shows :

Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:18 proxmox1 pmxcfs[1360]: [quorum] crit: quorum_initialize failed: 6
Mar 24 22:27:18 proxmox1 pmxcfs[1360]: [confdb] crit: confdb_initialize failed: 6
Mar 24 22:27:18 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:18 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:24 proxmox1 pmxcfs[1360]: [quorum] crit: quorum_initialize failed: 6
Mar 24 22:27:24 proxmox1 pmxcfs[1360]: [confdb] crit: confdb_initialize failed: 6
Mar 24 22:27:24 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:24 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:30 proxmox1 pmxcfs[1360]: [quorum] crit: quorum_initialize failed: 6
Mar 24 22:27:30 proxmox1 pmxcfs[1360]: [confdb] crit: confdb_initialize failed: 6
Mar 24 22:27:30 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:30 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:36 proxmox1 pmxcfs[1360]: [quorum] crit: quorum_initialize failed: 6
Mar 24 22:27:36 proxmox1 pmxcfs[1360]: [confdb] crit: confdb_initialize failed: 6
Mar 24 22:27:36 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:36 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:42 proxmox1 pmxcfs[1360]: [quorum] crit: quorum_initialize failed: 6
Mar 24 22:27:42 proxmox1 pmxcfs[1360]: [confdb] crit: confdb_initialize failed: 6
Mar 24 22:27:42 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:42 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
 
Thanks leancode.
I'll give it a try.

I didn't replied to dietmar immediately because I first wanted to recheck. There is in fact no trace in the logs :
Code:
root@proxmox2:~# /etc/init.d/cman restart
Stopping cluster: 
   Stopping dlm_controld... [  OK  ]
   Stopping fenced... [  OK  ]
   Stopping cman... [  OK  ]
   Unloading kernel modules... [  OK  ]
   Unmounting configfs... [  OK  ]
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... Cannot find node name in cluster.conf
Unable to get the configuration
Cannot find node name in cluster.conf
cman_tool: corosync daemon didn't start Check cluster logs for details
[FAILED]
root@proxmox2:~# tail /var/log/syslog
Mar 27 08:04:54 proxmox2 pmxcfs[1330]: [confdb] crit: confdb_initialize failed: 6
Mar 27 08:04:54 proxmox2 pmxcfs[1330]: [dcdb] crit: cpg_initialize failed: 6
Mar 27 08:04:54 proxmox2 pmxcfs[1330]: [dcdb] crit: cpg_initialize failed: 6
Mar 27 08:04:55 proxmox2 kernel: DLM (built Mar 20 2012 09:17:14) installed
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
root@proxmox2:~#
 
So, here are ther results :

I cleared as described, then rebooted both nodes. I recreated the cluster on both nodes, then I tried to add a node : with pvecm add 192.168.232.41 on nodes with IP 192.168.232.42.
I got the message "authentication key already exists"

I tried to add node2 to cluster on node1, and got the same message. Now, if I issue a pvecm status on node1 (node2 is quite similar), I have :

Code:
root@proxmox1:~# pvecm status
Version: 6.2.0
Config Version: 1
Cluster Name: ruche
Cluster Id: 3465
Cluster Member: Yes
Cluster Generation: 24
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: proxmox1
Node ID: 1
Multicast addresses: 239.192.13.150 
Node addresses: 192.168.232.41 
root@proxmox1:~#
 
Dear francoisd,
Listen, I am just trying this myself and I find that in addition to the above I had to delete:
/etc/pve/nodes/* as well as it holds the keys for the other nodes. I have also added a -rf to the remove (rm) command as the content of nodes and pve-cluster are directories that need to be deleted.
So the whole sequence to reset should be:
<code>
service cman stop
service pve-cluster stop
rm /etc/cluster/cluster.conf
rm -rf /var/lib/pve-cluster/*
rm -rf /etc/pve/nodes/*
service pve-cluster start
service cman start # This should do nothing quietly
</code>
Will post the results of my testing.
Also, as an aside. The reason I had to do this was that I have 2 ethernet cards on my machines. One (eth0) which has an external IP and one (eth1) which has an internal IP. When I first created the cluster it set itself up using the external ip. In order to change this I had to set VE_ROUTE_SRC_DEV="eth1" in /etc/vz/vz.conf. As I said, an aside...

UPDATE:
Yes, this worked for me perfectly. I created a new cluster with that and all seems to work just perfect. As always of course, your millage may vary....
 
Last edited:
Thanks leancode,

This also did it for me.
Just the rm -rf /etc/pve/nodes/* did not work :
root@proxmox1:~# rm -rf /etc/pve/nodes/*
rm: cannot remove `/etc/pve/nodes/*': Transport endpoint is not connected

I had first to start the pve-cluster, then delete the content of nodes, then restart the pve-cluster service.
In my case, I did not started the cman. (Just noticed I forgot, but this worked anyway).

Thanks,
 
After I did it
root@s3 / # pvecm nodes
cman_tool: Cannot open connection to cman, is it running ?

was cman running?
What is the output of "service cman status"
Unfortunately I don't have a proxmox machine in front of me to test this but I hope that gets you further.