Meaning of /etc/pve/cluster.conf and /etc/cluster/cluster.conf for cluster-setups

a really good question is: What is the real determinitation of /etc/pve/cluster.conf and /etc/cluster/cluster.conf
this question very much affects wheter proxmox sees the nodes online, how it syncs the configs ( e.g. config version not matching ) and how to get a degraded cluster up and working


Code:
[COLOR=#5c6169]Apr  4 19:30:06 pluto pmxcfs[2223]: [status] crit: cpg_send_message failed: 9[/COLOR]
[COLOR=#5c6169]Apr  4 19:30:06 pluto pmxcfs[2223]: [status] crit: cpg_send_message failed: 9[/COLOR]
[COLOR=#5c6169]Apr  4 19:30:06 pluto pmxcfs[2223]: [status] crit: cpg_send_message failed: 9[/COLOR]


those errors are dedidcated to this issue, and also this thread http://forum.proxmox.com/threads/8665-cman-keeps-crashing
we actually fixed it, but the reason is not clear yet
it really comes down if you edited the /etc/cluster/cluster.conf by hand and changed the conf_version
when the pve and the cluster cluster.conf get out of sync, you have to care about not only fixing the version number, but also fixing restarting pve-cluster, instead of only cman
as this seems to "sync back" pve/cluster.conf to cluster.conf

After you did that, you can run your cluster again. Interestingly, during all this, cmand / pvecm listst the cluster ok, listing nodes and the status just fine. Also, in the GUI, you see the cluster to be "online" in the summary, but having a red led ( and not being able to create any VMs on the red-flaged server). But you can see the CPU states, ram usage and storages.

Something seems to be pretty fishy here and needs some light by the devs. People seem to fight with the clusters a lot, some of those are dedicated to this issues, some are simply lack of documentation.

So lets get it started :)
 
Something seems to be pretty fishy here and needs some light by the devs. People seem to fight with the clusters a lot, some of those are dedicated to this issues, some are simply lack of documentation.

Sorry, I don't really understand your issue. Almost all problem reported so far are related to missing hostname entries in /etc/hosts.
 
Sorry, I don't really understand your issue. Almost all problem reported so far are related to missing hostname entries in /etc/hosts.

Thank you :)

you mean just right after:
- no multicast
- changed ssh ports (only 22 allowed)
- no pw-auth on sshd during pvecm add
- using ssh-aliases instead of pw-auth for adding a node
- lost connection with keys after creating a cluster ( due recreated authorized_keys)
- stucked cluster after trying to add a node with the above ( no quorum )
- missing pvelocalhost
- moved 8006 port
- periodicaly issues mouting /mnt/pve on restart of pve-cluster
- edited /etc/cluster/cluster.conf


Seriously, there are a number of issues.

What i want here is, to spread some light on what is /etc/cluster/cluster.conf and how it relates to /etc/pve/cluster.conf - this could make it easiert to actually find out, what happening here - currently it rather looks like a bug to me, but thats only guessing.

Thanks
 

Thank you tom. I already peaked at that topic, but i cant find any informations regarding /etc/cluster .. there is a explaination of /etc/pve. The main function of /etc/pve as a cluster-fs is clear to me.

But iam really not sure what it is about /etc/cluster/cluster.conf and especially what happens with conf_version when those files get out of sync? E.g. it seems if, conf_version of e/c/c > e/p/c, restarting /e/i/pve-cluster wont "sync back" /e/c/c
 
its not that trivial so if you want all details check the source code on our git.

if you have issues in using the system, open a new thread for each one, describing in detail what you have done and what is not working as expected.
 
Well, for me, the case here was to clear out exactly that and not having one person ( me ) checking the source code, then maybe understand it partially, soliving his problems and then go on. Iam pretty sure we should need to update docs / faq on this, as this is not trivial as you told. And because its not trivial, it needs guidance of the devs, not someone like me reading code i have never seen before - that would be to error-prone.

It would be great if someone can find some time to outline the archticture here - in the end, we should refactor this out into the wiki / cluster setup page. The latter one is, for me, currently completly useless as a documentation.
Of course, clusters are a complexity and everybody needs to first learn all the basics, but pve introduces some special internas which are not documented at all and would even let fail someone who has used cman before.
 
yes, documentation can always be better and everybody here is encouraged to help and improve it.

in a perfect world we would have all features, all docs and no bugs. and most people also want this free and also immediately.

but we are still working on this :)
 
But iam really not sure what it is about /etc/cluster/cluster.conf and especially what happens with conf_version when those files get out of sync?

I you increase version in /etc/pve/cluster.conf, all nodes copies that file into /etc/cluster/cluster.conf (after syntax check). The we tell cman to reload the config.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!