Proxmox Cluster

etienne_p

New Member
Oct 7, 2025
1
0
1
Hello,

I am doing some test about clustering two physical nodes along with a qDevice VM which role is to complete the quorum for a healthy cluster.

The storage used for the VMs is an iSCSI share based on a 2022 Windows Server. The storage is accessible for both nodes.


The first test I conducted is to simulate an electrical failure of the first node to see the behaviour of the whole Cluster via the iLO interface.


The test has been successful and all the VM configured in the HA panel were migrated to the second Node and started successfully.

The average downtime for all of the VM is about 4 to 5 minutes, depending on the network link ( 1 Gbps link).


I think there can be some improvement for a smaller downtime and a better use of the HA functionnality maybe :


  • Upgrade the network link to 10 Gbps : depends also on the switch
  • Adding a configuration to HA in order to migrate instantly the VM when the CRM detect the failure of the node the VM were stored ?

I found out that the time where the CRM detects the failure and the time the VMs are migating to the other node is quite important (3 minutes at least).

I think this might come from the cluster which try to "stop" the running VM on the failed Node before migrating them to the other node.


Is it possible for the cluster to live migrate the VM from a node who fails to another node ? I mean, witouth those 4 to 5 minutes downtime ?

Did I miss a configuration ?


Also, after restoring the failed node, in the HA panel of the Datacenter, the restored node show "Detected time drift ! " I believe it's due to the failure so a restart of the chronyd service should solve this problem.

Thank you in advance for your answer,

Etienne
 
possible for the cluster to live migrate the VM from a node who fails to another node?
No, because it's off. HA has a bit of a delay to be sure the node is really offline and not a temporary communication failure, so it isn't ever running twice. I don't know if that timeout can be adjusted.

HA policy can be configured to "migrate" when a node is shut down or rebooted, but then it's still on.