[SOLVED] Redundancy fails when node fails?

lifeboy

Renowned Member
We use ceph as FS on a cluster with 7 nodes. This cluster is used for testing, development and more. Today one of the nodes died. Since all the LXC and KVM are stored on ceph storage, they are completely there, but the configuration of the guests is not available since it's stored on the node that's dead. So apart from piecing together how each machine was configured, there seems to be no obvious way to regain access to these guest machines.

Surely someone has come across this before? Is it possible to:
a) find a copy of the lxc and qemu config files somewhere?
b) store these config files in the vpe-cluster filesystem so that the guest can be started from another node even if the last node that it was running on dies?
 
This is already done. All configs are saved under /etc/pve/nodes/<node>.
When using HA the config is automatically moved to the node directory it is started on on failover. Without HA you have to do so manually.
 
  • Like
Reactions: lifeboy