How to reset cluster on 2.0 RC.

francoisd

Renowned Member
Sep 10, 2009
53
6
73
My cluster configuration is not working (. I would like to reset it. How can I do ?

P.S. : Reinstalling is not an option as I have a long setup on DRBD, and I already reinstalled twice.
Just for info, my problem is :
"cman_tool: corosync daemon didn't start Check cluster logs for details" when I restart cman, but corosync.log is 0 bytes.

Thanks,
 
Quite similar on both nodes :
root@proxmox1:~# pvecm status
cman_tool: Cannot open connection to cman, is it running ?

And a tail on syslog shows :

Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:13 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:18 proxmox1 pmxcfs[1360]: [quorum] crit: quorum_initialize failed: 6
Mar 24 22:27:18 proxmox1 pmxcfs[1360]: [confdb] crit: confdb_initialize failed: 6
Mar 24 22:27:18 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:18 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:23 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:24 proxmox1 pmxcfs[1360]: [quorum] crit: quorum_initialize failed: 6
Mar 24 22:27:24 proxmox1 pmxcfs[1360]: [confdb] crit: confdb_initialize failed: 6
Mar 24 22:27:24 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:24 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:30 proxmox1 pmxcfs[1360]: [quorum] crit: quorum_initialize failed: 6
Mar 24 22:27:30 proxmox1 pmxcfs[1360]: [confdb] crit: confdb_initialize failed: 6
Mar 24 22:27:30 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:30 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:33 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:36 proxmox1 pmxcfs[1360]: [quorum] crit: quorum_initialize failed: 6
Mar 24 22:27:36 proxmox1 pmxcfs[1360]: [confdb] crit: confdb_initialize failed: 6
Mar 24 22:27:36 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:36 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:42 proxmox1 pmxcfs[1360]: [quorum] crit: quorum_initialize failed: 6
Mar 24 22:27:42 proxmox1 pmxcfs[1360]: [confdb] crit: confdb_initialize failed: 6
Mar 24 22:27:42 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:42 proxmox1 pmxcfs[1360]: [dcdb] crit: cpg_initialize failed: 6
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
Mar 24 22:27:43 proxmox1 pmxcfs[1360]: [status] crit: cpg_send_message failed: 9
 
Thanks leancode.
I'll give it a try.

I didn't replied to dietmar immediately because I first wanted to recheck. There is in fact no trace in the logs :
Code:
root@proxmox2:~# /etc/init.d/cman restart
Stopping cluster: 
   Stopping dlm_controld... [  OK  ]
   Stopping fenced... [  OK  ]
   Stopping cman... [  OK  ]
   Unloading kernel modules... [  OK  ]
   Unmounting configfs... [  OK  ]
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... Cannot find node name in cluster.conf
Unable to get the configuration
Cannot find node name in cluster.conf
cman_tool: corosync daemon didn't start Check cluster logs for details
[FAILED]
root@proxmox2:~# tail /var/log/syslog
Mar 27 08:04:54 proxmox2 pmxcfs[1330]: [confdb] crit: confdb_initialize failed: 6
Mar 27 08:04:54 proxmox2 pmxcfs[1330]: [dcdb] crit: cpg_initialize failed: 6
Mar 27 08:04:54 proxmox2 pmxcfs[1330]: [dcdb] crit: cpg_initialize failed: 6
Mar 27 08:04:55 proxmox2 kernel: DLM (built Mar 20 2012 09:17:14) installed
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
Mar 27 08:04:57 proxmox2 pmxcfs[1330]: [status] crit: cpg_send_message failed: 9
root@proxmox2:~#
 
So, here are ther results :

I cleared as described, then rebooted both nodes. I recreated the cluster on both nodes, then I tried to add a node : with pvecm add 192.168.232.41 on nodes with IP 192.168.232.42.
I got the message "authentication key already exists"

I tried to add node2 to cluster on node1, and got the same message. Now, if I issue a pvecm status on node1 (node2 is quite similar), I have :

Code:
root@proxmox1:~# pvecm status
Version: 6.2.0
Config Version: 1
Cluster Name: ruche
Cluster Id: 3465
Cluster Member: Yes
Cluster Generation: 24
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: proxmox1
Node ID: 1
Multicast addresses: 239.192.13.150 
Node addresses: 192.168.232.41 
root@proxmox1:~#
 
Dear francoisd,
Listen, I am just trying this myself and I find that in addition to the above I had to delete:
/etc/pve/nodes/* as well as it holds the keys for the other nodes. I have also added a -rf to the remove (rm) command as the content of nodes and pve-cluster are directories that need to be deleted.
So the whole sequence to reset should be:
<code>
service cman stop
service pve-cluster stop
rm /etc/cluster/cluster.conf
rm -rf /var/lib/pve-cluster/*
rm -rf /etc/pve/nodes/*
service pve-cluster start
service cman start # This should do nothing quietly
</code>
Will post the results of my testing.
Also, as an aside. The reason I had to do this was that I have 2 ethernet cards on my machines. One (eth0) which has an external IP and one (eth1) which has an internal IP. When I first created the cluster it set itself up using the external ip. In order to change this I had to set VE_ROUTE_SRC_DEV="eth1" in /etc/vz/vz.conf. As I said, an aside...

UPDATE:
Yes, this worked for me perfectly. I created a new cluster with that and all seems to work just perfect. As always of course, your millage may vary....
 
Last edited:
Thanks leancode,

This also did it for me.
Just the rm -rf /etc/pve/nodes/* did not work :
root@proxmox1:~# rm -rf /etc/pve/nodes/*
rm: cannot remove `/etc/pve/nodes/*': Transport endpoint is not connected

I had first to start the pve-cluster, then delete the content of nodes, then restart the pve-cluster service.
In my case, I did not started the cman. (Just noticed I forgot, but this worked anyway).

Thanks,
 
After I did it
root@s3 / # pvecm nodes
cman_tool: Cannot open connection to cman, is it running ?

was cman running?
What is the output of "service cman status"
Unfortunately I don't have a proxmox machine in front of me to test this but I hope that gets you further.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!