[SOLVED] One node seems to have lost participation in the pmxcfs

thisisbenwoo

New Member
May 28, 2021
14
1
3
50
I have a 5 node cluster. One of the nodes had some issues, and so hardware needed to be changed. Now when I boot the node up, it doesn't seem to be participating in the pmxcfs. So the when I look at the "Datacenter" view, that node seen as down. And when I log into the :8006 port of that node, it shows that all other nodes in the cluster is down! So I think something has gone out of sync. One thing I noticed is that the config_version of /etc/pve/corosync.conf on the node that had issues is at version 6, while the rest of the other nodes are at 9.

So, is there anything I can do??
Thank you.
 
Please provide the output of pveversion -v.

This happens when you change something while a node is offline.

What you can do is to stop pve-cluster and corosync on the problematic node:
Code:
systemctl stop corosync.service pve-cluster.service
Then start pmxcfs in local mode on that node.
Code:
pmxcfs -l
Copy the corosync config (/etc/pve/corosync.conf) from one of the other nodes to this one and also to /etc/corosync/corosync.conf.
Check both configs and make sure it contains the right config for your problematic node.

Once both files have been copied, you can kill the local pmxcfs and start corosync and pve-cluster again:
Code:
killall pmxcfs
systemctl start pve-cluster.service corosync.service
 
Please provide the output of pveversion -v.

This happens when you change something while a node is offline.

What you can do is to stop pve-cluster and corosync on the problematic node:
Code:
systemctl stop corosync.service pve-cluster.service
Then start pmxcfs in local mode on that node.
Code:
pmxcfs -l
Copy the corosync config (/etc/pve/corosync.conf) from one of the other nodes to this one and also to /etc/corosync/corosync.conf.
Check both configs and make sure it contains the right config for your problematic node.

Once both files have been copied, you can kill the local pmxcfs and start corosync and pve-cluster again:
Code:
killall pmxcfs
systemctl start pve-cluster.service corosync.service
Thank you!! you're a lifesaver..
I actually did all but one step (the killall step). Everything is happy again! :)
 
  • Like
Reactions: mira

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!