Cluster node shot off after hardware error - VM configs on that node?

Nov 24, 2017
5
2
43
Hi,

We have a 5 node cluster and we had a cluster node who shot off because of a hardware failure. The node is offline.

I moved all VM configs on a cluster node from /etc/pve/offline node/qemu-server/* to /etc/pve/online node/qemu-server/* and started up all VMs.

What happens when the offline cluster node comes back online? Does it refresh his config with the cluster config or do we have VM conflicts?

When I startup the offline cluster node alone, without network connectivity to the cluster, everything in /etc/pve/ is read only, so I can't clean up /etc/pve/qemu-server/ resp /etc/pve/offline node/qemu-server/ to reflect cluster nodes config. (Maybe a bad idea)

What are my options other than deleting the offline node from the cluster, make a fresh install on the offline node and add it back to the cluster?

Any advice is greatly appreciated
 
I moved all VM configs on a cluster node from /etc/pve/offline node/qemu-server/* to /etc/pve/online node/qemu-server/* and started up all VMs
When you don't have ha defined for your vm/lxc than that is the way to bring them up on other node.
What happens when the offline cluster node comes back online? Does it refresh his config with the cluster config or do we have VM conflicts?
It will auto-refresh it's config in /etc/pve when is able to connect to the cluster.
When I startup the offline cluster node alone, without network connectivity to the cluster, everything in /etc/pve/ is read only, so I can't clean up /etc/pve/qemu-server/ resp /etc/pve/offline node/qemu-server/ to reflect cluster nodes config. (Maybe a bad idea)
readonly is normal as he think's (as one of five alone) the others aren't reachable and with just 20% quorum vote it's ro as >=50% is needed.
So yes, it's a bad idea to try to fix "ro files" that which is even useless.
What are my options other than deleting the offline node from the cluster, make a fresh install on the offline node and add it back to the cluster?
When not your pve os disk is broken than just repair and bring back host with network as just where rebooted.
When os disk (with or without mirror) broken, remove node from cluster like in pve docu, install new with new name and other ip, join new node to remaining nodes (all will updated), migrate vm/lxc back as needed.