Cluster services won't start

kljose

New Member
Jul 13, 2022
2
0
1
Hi there, I have a Proxmox cluster where all nodes are 6.4.x, and I tried adding a new node, however the node never got added completely for whatever reason. It looks like there was some issue with that particular node. Cluster services would hang and not fully start, not mounting /etc/pve filesystem , so I removed all entries for this new node from the corosync.conf and any other references to the new node after I locally started pmxcfs on each node.

Now, when I try to start the pve-cluster service and the cluster filesystem in networked mode on any of the nodes, it enters a please wait scenario, and the cluster filesystem never starts with error
"pmxcfs[27121]: [main] notice: unable to acquire pmxcfs lock - trying again"

I've tried removing the pmxcfs lock file from each of the nodes (as outlined here https://commitandquit.wordpress.com/2016/10/29/proxmox-etcpve-blocked/), and then restart cluster services, but the same thing happens, where the cluster filesystem cannot start in networked mode.

Everything was working fine with cluster services and the cluster filesystem up until I tried to add that one node to the cluster.

Could it be that the the /etc/pve is out of sync on each of the nodes, and therefore hangs? Anything else I should look at?

Thanks
 
I notice that "config_version" is set to 5 on two of my nodes, and set to 6 on the other 2 nodes.

Can I manually update the config_version to ensure all are at 6? Would that cause issues, or maybe allow the cluster filesystem to start properly?