How to re-add failed node to cluster, without removing node

Magneto

Renowned Member
Jul 30, 2017
145
5
83
46
I upgraded a 5 node Cluster (with CEPH) from Proxmox 8.4 to Proxmox 9.1.4. The 5th node failed.

So I reinstalled Proxmox 9.1 on the 5th node and tried to add it back to the cluster but all isn't well.

What are the correct steps to take in this case ? Since it's a 5 node cluster with CEPH, I don't want to remove PVE05 and re-add it manually?

I have copied /var/lib/pve-cluster/config.db from one of the other nodes to /var/lib/pve-cluster/, as well as /etc/hosts and rebooted.

Yet, in the GUI PVE05 still shows offline.

When I try and update the certs on PV05 I get this:
Last login: Mon Jun 1 18:40:37 2026 from 192.168.100.104
root@PVE05:~# pvecm updatecerts -F
waiting for pmxcfs mount to appear and get quorate...
waiting for pmxcfs mount to appear and get quorate...
waiting for pmxcfs mount to appear and get quorate...
waiting for pmxcfs mount to appear and get quorate...
waiting for pmxcfs mount to appear and get quorate...

The nodes can SSH to each over via it's hostname
 
May I ask why you don’t want to remove the node and re-add it? Probably it is the easiest way, and if the rest of the cluster and Ceph are configured correctly, you should not experience any downtime in your production environment.
 
May I ask why you don’t want to remove the node and re-add it? Probably it is the easiest way, and if the rest of the cluster and Ceph are configured correctly, you should not experience any downtime in your production environment.
With 4 nodes, there's no quorum and the live cluster will fail
 
When your (working) cluster had 5 Nodes, 2 can fail (from the corosync perspective). So you can remove one without a problem.
From the ceph perspective it depends, how many mons you have and how your pool is configured.

The 5th node ist already offline, when you remove it, it will not change anything at the quorum.