replacing a node in a non-ceph cluster

Faris Raouf · Sep 12, 2018

Please can someone give me an indication of the best action to take in the event that a node in a cluster is lost (say total failure - all hard disk lost), and is then replaced?

In my case I would have no ceph and so no replication. Just two or three nodes in a cluster, each with only a local filesystem (LVM).

VM backups created via Proxmox are rsynced off-server, so they would be safe one way or another.

So, in the event of a main HD failure on a node, I expect I would re-install Proxmox from scratch on a new HD. It would be allocated the same IP as the failed node.

But what then? What about the hostname? How do I re-join this node to the cluster?

Is it as simple as removing the failed node from the cluster (pvecm delnode NodeName on one of the still running nodes) and deleting /etc/pve/nodes/NodeName on all the running nodes, then giving the replacement node the same hostname as the failed one, and joining it to the cluster?

Or is there more to it? Are there any other pitfalls I need to watch out for?

And following this, once the backups are copied back to local storage on the replacement node, I hope I can just restore the VM backups via the GUI or command line, or is there more to that too?

One reason for asking all this is that I see there is discussion about backing up the node configuration, e.g. https://forum.proxmox.com/threads/best-practice-for-proxmox-self-backup.38382/ and I don't quite understand why this would be necessary - at least in a simple non-ceph setup like mine. Yet it is obviously important for other members of the forum, so my worry is that I've overlooked something fundamental, hence this post.

The last thing I want is to find I can't restore a VM backup after a node failure, simply because I did not think to backup some configuration file or other.

Search

Search

replacing a node in a non-ceph cluster

Faris Raouf

Well-Known Member

We value your privacy