Hi All,
How long should I be expecting a PVE cluster to take to bring up VM and LXC instances on other hosts when the host they are currently on has a dirty shutdown (power cord out, etc)?
I've setup a 3 node cluster with ceph using identical hardware on each node for testing. PVE cluster network is only 1Gbps, however Ceph network is 10Gbps. I have a HA group setup with the HA shutdown policy set to migrate. All running VM and LXC instances have HA set to be a member of this HA group with a running state.
When I gracefully shutdown/reboot a host all the VM and LXC instances hosting within are seamlessly migrated without so much as dropping a single packet from a continuous ping run externally from the cluster. When I test a host outage by pulling the power cord it seems to take a long time (as in minutes, I've yet to time it exactly though) for the VM & LXC instances to be brought back online.
Is this to be expected from this sort of host outage, or are there configuration steps I need to be taking to ensure this downtime is minimised?
How long should I be expecting a PVE cluster to take to bring up VM and LXC instances on other hosts when the host they are currently on has a dirty shutdown (power cord out, etc)?
I've setup a 3 node cluster with ceph using identical hardware on each node for testing. PVE cluster network is only 1Gbps, however Ceph network is 10Gbps. I have a HA group setup with the HA shutdown policy set to migrate. All running VM and LXC instances have HA set to be a member of this HA group with a running state.
When I gracefully shutdown/reboot a host all the VM and LXC instances hosting within are seamlessly migrated without so much as dropping a single packet from a continuous ping run externally from the cluster. When I test a host outage by pulling the power cord it seems to take a long time (as in minutes, I've yet to time it exactly though) for the VM & LXC instances to be brought back online.
Is this to be expected from this sort of host outage, or are there configuration steps I need to be taking to ensure this downtime is minimised?