Replace node in a 2-node cluster

Elleni

Well-Known Member
Jul 6, 2020
218
14
58
52
I migrated all guests to the remaining node, and removed the empty node from the cluster. I then tried to add a new pve server to the cluster. Unfortunatelly I had not deleted the entry of the removed node from known_hosts file on the remaining node. As the new node has the same hostname as the removed node, I was not yet successfull.

The situation now is that:
- the datacenter contains the new nodename but it is red - looking at the remaining node's webinterface
- the new node is not accessible by webinterface anymore, but I still can login via ssh

Is this a problem of unsucessfull exchange of certificates? Or did it not work because I should have issued a pvecm expected 1 first? However although I am prepared to reinstall everything and restore a bunch of vm's I would love to fix this as elseway it would take a lot of more time.

Thanks for any help on howto restore this cluster. I will provide any informations needed that can help to fix this. The 2node cluster has a third server with qdedice installed by the way as third vote for the quorum.

Logging in via ssh and looking at pvecm status I seen that the new node had still somehow created a cluster config, so I went through a guide removing cluster config altogether and rebooted. Now webinterface access is restored.

I now want to remove the appearance of non functional node2 on webinterface of the remaining node (the one with all the vms on i) to then be able to add the new node to that cluster - while the new node will have the same ip and name of the removed node.

Realized that removed node1 was still in webinterface because it was on /etc/corosyncd/corosync.conf. Removed it from there and restarted corosync and pvestatd made it disapear from webinterface.
 
Last edited:
I made some progress. By stopping corosync pvestad and pve-cluster service and issuing a pmxcfs -l I was also able to edit /etc/pve/corosync.conf which I thought came in handy as the server needs some physical rebuilding for the clusternetwork (direct link) so I thought in the meantime I can change the clusternetwork to the already existing link and change the clusternetwork later once I installed direct an additional nic on the new pve.

But then starting corosync pvestatd and pve-cluster only the first two started. Then I got:

Job for pve-cluster.service failed because the control process exited with error code.
See "systemctl status pve-cluster.service" and "journalctl -xeu pve-cluster.service" for details.

notice: unable to acquire pmxcfs lock - trying again

Rebooting both servers and joining domain this time worked. Checking if everything is working as intended
 
Last edited:
It looks like everything is ok though I dont know about the votes, as I had put expected 1 while troubleshooting.

Code:
root@PVE002:~# pvecm status
Cluster information
-------------------
Name:             PVExCL001
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sat Oct 11 19:57:21 2025
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.240
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1         NR 10.3.8.6
0x00000002          1    A,V,NMW 10.3.8.7 (local)
0x00000000          1            Qdevice

Code:
Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1         NR PVE001
         2          1    A,V,NMW PVE002 (local)
         0          1            Qdevice

Is it a correct assumption that once the direct link is established by adding a dedicated network card for cluster networking, I can do that by just adapting net network configuration on the new node modifying /etc/network/interfaces and /etc/hosts and then finally /etc/pve/corosync.conf and restarting services or is there more to do or a more elegant way like within webinterface or cli?
 
Last edited: