Cluster node shot off after hardware error - VM configs on that node?

macitnow · Feb 13, 2025

Hi,

We have a 5 node cluster and we had a cluster node who shot off because of a hardware failure. The node is offline.

I moved all VM configs on a cluster node from /etc/pve/offline node/qemu-server/* to /etc/pve/online node/qemu-server/* and started up all VMs.

What happens when the offline cluster node comes back online? Does it refresh his config with the cluster config or do we have VM conflicts?

When I startup the offline cluster node alone, without network connectivity to the cluster, everything in /etc/pve/ is read only, so I can't clean up /etc/pve/qemu-server/ resp /etc/pve/offline node/qemu-server/ to reflect cluster nodes config. (Maybe a bad idea)

What are my options other than deleting the offline node from the cluster, make a fresh install on the offline node and add it back to the cluster?

Any advice is greatly appreciated

waltar · Feb 13, 2025

macitnow said:
I moved all VM configs on a cluster node from /etc/pve/offline node/qemu-server/* to /etc/pve/online node/qemu-server/* and started up all VMs

When you don't have ha defined for your vm/lxc than that is the way to bring them up on other node.

macitnow said:
What happens when the offline cluster node comes back online? Does it refresh his config with the cluster config or do we have VM conflicts?

It will auto-refresh it's config in /etc/pve when is able to connect to the cluster.

macitnow said:
When I startup the offline cluster node alone, without network connectivity to the cluster, everything in /etc/pve/ is read only, so I can't clean up /etc/pve/qemu-server/ resp /etc/pve/offline node/qemu-server/ to reflect cluster nodes config. (Maybe a bad idea)

readonly is normal as he think's (as one of five alone) the others aren't reachable and with just 20% quorum vote it's ro as >=50% is needed.
So yes, it's a bad idea to try to fix "ro files" that which is even useless.

macitnow said:
What are my options other than deleting the offline node from the cluster, make a fresh install on the offline node and add it back to the cluster?

When not your pve os disk is broken than just repair and bring back host with network as just where rebooted.
When os disk (with or without mirror) broken, remove node from cluster like in pve docu, install new with new name and other ip, join new node to remaining nodes (all will updated), migrate vm/lxc back as needed.

macitnow · Feb 14, 2025

Thank you very much for your explanation!

Search

Search

Cluster node shot off after hardware error - VM configs on that node?

macitnow

Active Member

waltar

Renowned Member

macitnow

Active Member

We value your privacy