[SOLVED] One node seems to have lost participation in the pmxcfs

thisisbenwoo

New Member
May 28, 2021
14
1
3
51
I have a 5 node cluster. One of the nodes had some issues, and so hardware needed to be changed. Now when I boot the node up, it doesn't seem to be participating in the pmxcfs. So the when I look at the "Datacenter" view, that node seen as down. And when I log into the :8006 port of that node, it shows that all other nodes in the cluster is down! So I think something has gone out of sync. One thing I noticed is that the config_version of /etc/pve/corosync.conf on the node that had issues is at version 6, while the rest of the other nodes are at 9.

So, is there anything I can do??
Thank you.
 
Please provide the output of pveversion -v.

This happens when you change something while a node is offline.

What you can do is to stop pve-cluster and corosync on the problematic node:
Code:
systemctl stop corosync.service pve-cluster.service
Then start pmxcfs in local mode on that node.
Code:
pmxcfs -l
Copy the corosync config (/etc/pve/corosync.conf) from one of the other nodes to this one and also to /etc/corosync/corosync.conf.
Check both configs and make sure it contains the right config for your problematic node.

Once both files have been copied, you can kill the local pmxcfs and start corosync and pve-cluster again:
Code:
killall pmxcfs
systemctl start pve-cluster.service corosync.service
 
Please provide the output of pveversion -v.

This happens when you change something while a node is offline.

What you can do is to stop pve-cluster and corosync on the problematic node:
Code:
systemctl stop corosync.service pve-cluster.service
Then start pmxcfs in local mode on that node.
Code:
pmxcfs -l
Copy the corosync config (/etc/pve/corosync.conf) from one of the other nodes to this one and also to /etc/corosync/corosync.conf.
Check both configs and make sure it contains the right config for your problematic node.

Once both files have been copied, you can kill the local pmxcfs and start corosync and pve-cluster again:
Code:
killall pmxcfs
systemctl start pve-cluster.service corosync.service
Thank you!! you're a lifesaver..
I actually did all but one step (the killall step). Everything is happy again! :-)
 
  • Like
Reactions: mira