VM restart on complete host failure taking several minutes

dsmteam

New Member
Oct 18, 2024
4
1
3
Hello everyone,
We are testing Proxmox for production (currently using vmware broadcom esx)
We have three hosts with iscsi san purestorage with LVM and multipath configured for a reliable cluster and see that the functionnality are close to what esx provides
However one functionnality in production that is important is close to instant restart of VM to a different host in case of complete host failure (psod, power failure and such)
Right now, in our simulation, we see that the VM are migrated after roughly three minutes which for some services is an eternity when it's within a few seconds on ESX.
I have browsed the forum for various informations and found several topics on that subject but none gives a straight answer as to why the system needs three minutes to start a VM on a different host and in particular how to speed up the process.
Only information that seems to allow to reduce the wait time is reducing fencedelay in nodestatus.pm on all hosts but this doesn't look to change much in the restart time.
I also see that we can do something with the watchdog but to my understanding this is a local check so if the host is crashed, no local monitoring will help in that regard.
Any help or clarification on that topic will be appreciated.
 
Last edited:
Hi @dsmteam,

Proxmox VE cluster, before start up VM on another node, it waits to ensure that there is no short outage. In time there is triggering the unreachable node, and also unreachable node triggering rest nodes of cluster. There is another time for VM fencing + some time for VM start on new node. This system ensure that there will not be running more same VM. All this activities takes about two+ minutes.

Older, but still valid posts with explanation of times https://forum.proxmox.com/threads/faster-failover-possible.36894/

And another discussion about changing hardcoded timing

https://forum.proxmox.com/threads/modify-the-ha-triggering-time.109665/

Lukas
 
Last edited:
  • Like
Reactions: waltar
I totally get the frustration with Proxmox and those VM restart times. When I was setting up my Proxmox cluster, I had similar issues. I found that adjusting the fencedelay settings helped a bit, but it wasn’t a huge difference. It also helped to optimize the cluster communication by tweaking some pvecm settings, especially the heartbeat intervals. I’d recommend looking into the High Availability feature, too; it can really minimize downtime when a host fails. It might not eliminate all delays, but it can definitely speed things up.
 
Hi @dsmteam,

Proxmox VE cluster, before start up VM on another node, it waits to ensure that there is no short outage. In time there is triggering the unreachable node, and also unreachable node triggering rest nodes of cluster. There is another time for VM fencing + some time for VM start on new node. This system ensure that there will not be running more same VM. All this activities takes about two+ minutes.

Older, but still valid posts with explanation of times https://forum.proxmox.com/threads/faster-failover-possible.36894/

And another discussion about changing hardcoded timing

https://forum.proxmox.com/threads/modify-the-ha-triggering-time.109665/

Lukas
Thanks a lot, I had read the first thread you mentioned but not the second one.
The solution seems to match our need but the fact that there are a lot of interaction with other component that would not be taken into account is a bit scary so I'll give it a go for a test but we might have to give up on fast recovery all together and include this in the calculation of our SLA.
Can't get monthly 99.999% with Proxmox
 
Last edited:
  • Like
Reactions: Lukas Moravek
Thanks a lot, I had read the first thread you mentioned but not the second one.
The solution seems to match our need but the fact that there are a lot of interaction with other component that would not be taken into account is a bit scary so I'll give it a go for a test but we might have to give up on fast recovery all together and include this in the calculation of our SLA.
Can't get monthly 99.999% with Proxmox
You are correct, HW failure is all time unfortunate. For planned outages, you can avoid and use live migration between nodes, with CEPH is almost unnoticeable, but I believe you are aware about this, this is just notice new users.

Lukas
 
You are correct, HW failure is all time unfortunate. For planned outages, you can avoid and use live migration between nodes, with CEPH is almost unnoticeable, but I believe you are aware about this, this is just notice new users.

Lukas
We use a SSD San for storage with iscsi so live migration is not an issue with promox (most of the time we didn't even see a ping loss and TCP sessions stay active).
It is not as fast as ESX but does the job adequately.
Reducing the fencedelay to 10s is already a big gain going from 3:40 to 2:30 but I'm pretty sure that hard coded settings are really too conservative as a host failure can be confirmed in 10 seconds and all locks should be able to be release in a few seconds too. There is really no reason to wait more than 30 seconds to start a VM on a different host.
Would be nice if we had more control over this.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!