any reason why a VM is not automatically added into HA?

kellogs

Member
May 14, 2024
96
10
8
Coming from Vmware if a VM has been created in Vcenter, it would automaticallt failover to other node if the host is down.

Is there any reason why this behaviour is not default with Proxmox?
 
Hello,

There are possible scenarios were HA can harm your production. Therefore it is not enabled by default.

If for example all the networks used by Corosync stop working then all nodes will be fenced so their VMs can be recovered on nodes with quorum, but since no one has quorum the guest can't be migrated and the entire cluster will be rebooted.

Additionally, It might be possible that a node loses corosync quorum without its guests having any kind of issue, in that case automatic fail over only adds downtime.

See our docs [1].

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#ha_manager_fencing
 
  • Like
Reactions: kellogs
thank you guys for the information. I have setup few VMs which we forgot to turn on HA and the host which they are on was dead in the water. We experienced downtime due to this but luckly there was a command line to migrate the VMID to another working node in cluster hence of this question. so for HA to be stable a stable corosync network is a must (we have a pair of stacked switches) and a quorom (we have total 15 nodes)
 
Coming from Vmware if a VM has been created in Vcenter, it would automaticallt failover to other node if the host is down.

Is there any reason why this behaviour is not default with Proxmox?
VMware does things its way and Proxmox its way. You might note that a Proxmox HA cluster allows you to set VMs as start on boot, which VMware does not.

Also note that you don't need a vCentre (yay) ie an orchestration appliance and if you recall, you probably had several orchestration appliances, each guzzling shed loads of RAM vCPUs and disc space.

I've been using VMware since 2.x and things have changed somewhat. There used to be GSX and ESX as well as ESXi. They did do rather well with VMFS which turns out to have been a killer feature. MS Hyper-V clustering is ... a bit wank and a massive bodge and Proxmox and co can't do snapshots on iSCSI shared storage. However if you wave a decent Ceph cluster at Proxmox then you are golden.

You are almost certainly used to writing and following procedures so, note the differences and document what should be done. You also have way more control on how the nodes themselves work - its a normal Linux box. Do be careful with that! The killer feature for me is that you get the full equivalent of Enterprise + out of the box. Open vSwitch is very tasty but, again, you must take care to understand how it works. DVS on VMware is lovely but seriously expensive. Wack on Tanzu (containers and that) and you will need to sell a kidney.

Take your time and get to grips with a new way of doing stuff. Try to rethink what you got used to with VMware and be open minded - its all rather liberating \o/
 
VMware does things its way and Proxmox its way. You might note that a Proxmox HA cluster allows you to set VMs as start on boot, which VMware does not.
This is not correct, vsphere can also start VMs automatically. Since you are required to configure this on every ESXi, almost no one does it.
thank you guys for the information. I have setup few VMs which we forgot to turn on HA and the host which they are on was dead in the water. We experienced downtime due to this but luckly there was a command line to migrate the VMID to another working node in cluster hence of this question. so for HA to be stable a stable corosync network is a must (we have a pair of stacked switches) and a quorom (we have total 15 nodes)
A dedicated network is generally recommended for Corosync.
I also have many setups without a dedicated network, but you should always have several redundant networks for a stable cluster.
With 15 nodes, you always have a quorum majority if up to 7 nodes fail. I don't see any need for an extra quorum.
With large clusters, you should always make sure that Corosync runs on a low-latency network.
 
so for HA to be stable a stable corosync network is a must (we have a pair of stacked switches)

The issue with stacking and/or LACP/MLAG (without BFD) is that the switchover can take >4 seconds which is enough to start experiencing quorum loss on such link, which (with HA guests) will almost invariably result in watchdog reboots of the said nodes. It is thus much better to have redundant corosync links:

https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_redundancy
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!