VM restart on complete host failure taking several minutes

dsmteam · Oct 18, 2024

Hello everyone,
We are testing Proxmox for production (currently using ~~vmware~~ broadcom esx)
We have three hosts with iscsi san purestorage with LVM and multipath configured for a reliable cluster and see that the functionnality are close to what esx provides
However one functionnality in production that is important is close to instant restart of VM to a different host in case of complete host failure (psod, power failure and such)
Right now, in our simulation, we see that the VM are migrated after roughly three minutes which for some services is an eternity when it's within a few seconds on ESX.
I have browsed the forum for various informations and found several topics on that subject but none gives a straight answer as to why the system needs three minutes to start a VM on a different host and in particular how to speed up the process.
Only information that seems to allow to reduce the wait time is reducing fencedelay in nodestatus.pm on all hosts but this doesn't look to change much in the restart time.
I also see that we can do something with the watchdog but to my understanding this is a local check so if the host is crashed, no local monitoring will help in that regard.
Any help or clarification on that topic will be appreciated.

Lukas Moravek · Oct 18, 2024

Hi @dsmteam,

Proxmox VE cluster, before start up VM on another node, it waits to ensure that there is no short outage. In time there is triggering the unreachable node, and also unreachable node triggering rest nodes of cluster. There is another time for VM fencing + some time for VM start on new node. This system ensure that there will not be running more same VM. All this activities takes about two+ minutes.

Older, but still valid posts with explanation of times https://forum.proxmox.com/threads/faster-failover-possible.36894/

And another discussion about changing hardcoded timing

https://forum.proxmox.com/threads/modify-the-ha-triggering-time.109665/

Lukas

Peritann · Oct 18, 2024

I totally get the frustration with Proxmox and those VM restart times. When I was setting up my Proxmox cluster, I had similar issues. I found that adjusting the fencedelay settings helped a bit, but it wasn’t a huge difference. It also helped to optimize the cluster communication by tweaking some pvecm settings, especially the heartbeat intervals. I’d recommend looking into the High Availability feature, too; it can really minimize downtime when a host fails. It might not eliminate all delays, but it can definitely speed things up.

dsmteam · Oct 18, 2024

Lukas Moravek said:
Hi @dsmteam,

Proxmox VE cluster, before start up VM on another node, it waits to ensure that there is no short outage. In time there is triggering the unreachable node, and also unreachable node triggering rest nodes of cluster. There is another time for VM fencing + some time for VM start on new node. This system ensure that there will not be running more same VM. All this activities takes about two+ minutes.

Older, but still valid posts with explanation of times https://forum.proxmox.com/threads/faster-failover-possible.36894/

And another discussion about changing hardcoded timing

https://forum.proxmox.com/threads/modify-the-ha-triggering-time.109665/

Lukas

Thanks a lot, I had read the first thread you mentioned but not the second one.
The solution seems to match our need but the fact that there are a lot of interaction with other component that would not be taken into account is a bit scary so I'll give it a go for a test but we might have to give up on fast recovery all together and include this in the calculation of our SLA.
Can't get monthly 99.999% with Proxmox

Lukas Moravek · Oct 18, 2024

dsmteam said:
Thanks a lot, I had read the first thread you mentioned but not the second one.
The solution seems to match our need but the fact that there are a lot of interaction with other component that would not be taken into account is a bit scary so I'll give it a go for a test but we might have to give up on fast recovery all together and include this in the calculation of our SLA.
Can't get monthly 99.999% with Proxmox

You are correct, HW failure is all time unfortunate. For planned outages, you can avoid and use live migration between nodes, with CEPH is almost unnoticeable, but I believe you are aware about this, this is just notice new users.

Lukas

dsmteam · Oct 21, 2024

Lukas Moravek said:
You are correct, HW failure is all time unfortunate. For planned outages, you can avoid and use live migration between nodes, with CEPH is almost unnoticeable, but I believe you are aware about this, this is just notice new users.

Lukas

We use a SSD San for storage with iscsi so live migration is not an issue with promox (most of the time we didn't even see a ping loss and TCP sessions stay active).
It is not as fast as ESX but does the job adequately.
Reducing the fencedelay to 10s is already a big gain going from 3:40 to 2:30 but I'm pretty sure that hard coded settings are really too conservative as a host failure can be confirmed in 10 seconds and all locks should be able to be release in a few seconds too. There is really no reason to wait more than 30 seconds to start a VM on a different host.
Would be nice if we had more control over this.

Search

Search

VM restart on complete host failure taking several minutes

dsmteam

New Member

Lukas Moravek

Member

Peritann

New Member

dsmteam

New Member

Lukas Moravek

Member

dsmteam

New Member

We value your privacy