Hello everyone,
We are testing Proxmox for production (currently usingvmware broadcom esx)
We have three hosts with iscsi san purestorage with LVM and multipath configured for a reliable cluster and see that the functionnality are close to what esx provides
However one functionnality in production that is important is close to instant restart of VM to a different host in case of complete host failure (psod, power failure and such)
Right now, in our simulation, we see that the VM are migrated after roughly three minutes which for some services is an eternity when it's within a few seconds on ESX.
I have browsed the forum for various informations and found several topics on that subject but none gives a straight answer as to why the system needs three minutes to start a VM on a different host and in particular how to speed up the process.
Only information that seems to allow to reduce the wait time is reducing fencedelay in nodestatus.pm on all hosts but this doesn't look to change much in the restart time.
I also see that we can do something with the watchdog but to my understanding this is a local check so if the host is crashed, no local monitoring will help in that regard.
Any help or clarification on that topic will be appreciated.
We are testing Proxmox for production (currently using
We have three hosts with iscsi san purestorage with LVM and multipath configured for a reliable cluster and see that the functionnality are close to what esx provides
However one functionnality in production that is important is close to instant restart of VM to a different host in case of complete host failure (psod, power failure and such)
Right now, in our simulation, we see that the VM are migrated after roughly three minutes which for some services is an eternity when it's within a few seconds on ESX.
I have browsed the forum for various informations and found several topics on that subject but none gives a straight answer as to why the system needs three minutes to start a VM on a different host and in particular how to speed up the process.
Only information that seems to allow to reduce the wait time is reducing fencedelay in nodestatus.pm on all hosts but this doesn't look to change much in the restart time.
I also see that we can do something with the watchdog but to my understanding this is a local check so if the host is crashed, no local monitoring will help in that regard.
Any help or clarification on that topic will be appreciated.
Last edited: