Long-time offline of a node

cliffpercent

Active Member
Mar 1, 2021
8
0
41
25
I have a node with hardware failure, its resurrection has been deferred repeatedly now (soon a year). HA votes are set to 0, quorum and capacity is OK with the remaining odd number of nodes. The rest of the nodes have been receiving updates as usual.

Are there any gotchas for keeping a node offline for years at a time in a PVE cluster? Something on the lines of foobar configuration version only updating to match the lowest denominator and later unsupported by the normal nodes updating.
 
Are there any gotchas for keeping a node offline for years at a time in a PVE cluster?
I would remove it from the cluster and handle it as "separated" - https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node

When it is repaired you would need to join the cluster again. Note that this might not be trivial as a new name and a new IP address are recommended - or you need to fiddle around with some internal settings.

One detail I learned when I added another node to my cluster while one member was offline: do not do that ;-)
This means: do only modify the structure of the cluster while all nodes are online. If you ignore this then you have to manually repair corosync. While this is documented it might be a pita when you need to do it for the first time.

The other thing is: if you let the dead node stay in the cluster it keeps the "Expected votes" to "actually present plus one". Actually I do that successfully in my Homelab (for power consumption reasons) but it is a detail to keep in mind. Continuously relying on "pvecm expected x" is a workaround, not a solution.