a really good question is: What is the real determinitation of /etc/pve/cluster.conf and /etc/cluster/cluster.conf
this question very much affects wheter proxmox sees the nodes online, how it syncs the configs ( e.g. config version not matching ) and how to get a degraded cluster up and working
those errors are dedidcated to this issue, and also this thread http://forum.proxmox.com/threads/8665-cman-keeps-crashing
we actually fixed it, but the reason is not clear yet
it really comes down if you edited the /etc/cluster/cluster.conf by hand and changed the conf_version
when the pve and the cluster cluster.conf get out of sync, you have to care about not only fixing the version number, but also fixing restarting pve-cluster, instead of only cman
as this seems to "sync back" pve/cluster.conf to cluster.conf
After you did that, you can run your cluster again. Interestingly, during all this, cmand / pvecm listst the cluster ok, listing nodes and the status just fine. Also, in the GUI, you see the cluster to be "online" in the summary, but having a red led ( and not being able to create any VMs on the red-flaged server). But you can see the CPU states, ram usage and storages.
Something seems to be pretty fishy here and needs some light by the devs. People seem to fight with the clusters a lot, some of those are dedicated to this issues, some are simply lack of documentation.
So lets get it started
this question very much affects wheter proxmox sees the nodes online, how it syncs the configs ( e.g. config version not matching ) and how to get a degraded cluster up and working
Code:
[COLOR=#5c6169]Apr 4 19:30:06 pluto pmxcfs[2223]: [status] crit: cpg_send_message failed: 9[/COLOR]
[COLOR=#5c6169]Apr 4 19:30:06 pluto pmxcfs[2223]: [status] crit: cpg_send_message failed: 9[/COLOR]
[COLOR=#5c6169]Apr 4 19:30:06 pluto pmxcfs[2223]: [status] crit: cpg_send_message failed: 9[/COLOR]
those errors are dedidcated to this issue, and also this thread http://forum.proxmox.com/threads/8665-cman-keeps-crashing
we actually fixed it, but the reason is not clear yet
it really comes down if you edited the /etc/cluster/cluster.conf by hand and changed the conf_version
when the pve and the cluster cluster.conf get out of sync, you have to care about not only fixing the version number, but also fixing restarting pve-cluster, instead of only cman
as this seems to "sync back" pve/cluster.conf to cluster.conf
After you did that, you can run your cluster again. Interestingly, during all this, cmand / pvecm listst the cluster ok, listing nodes and the status just fine. Also, in the GUI, you see the cluster to be "online" in the summary, but having a red led ( and not being able to create any VMs on the red-flaged server). But you can see the CPU states, ram usage and storages.
Something seems to be pretty fishy here and needs some light by the devs. People seem to fight with the clusters a lot, some of those are dedicated to this issues, some are simply lack of documentation.
So lets get it started