3rd host for HA question

MrJake

New Member
Mar 12, 2025
10
0
1
I currently have a cluster of 2 servers with shared storage over iscsi with a SAN. That is working great but I now want to enable HA where if I lose a host the other can continue with the vm or containers like vmware and vmotion. I read you need 3 votes hence the 3rd node. I currently only have 2 nodes with similar configuration hardware. Getting a 3rd would be a challenge.
My question is, for this HA cluster to work can the 3rd host be
1. a simple Vm running on a different cluster
2. a simple server that will not host any vm but simple participate to vote
3. can I have my 3rd server be the Proxmox Backup Server I will setup shortly that will run with a different san in a different enviroment for safety reasons.

Any recommendations on which of the 3 is the best possible solution or do i have to go to number 4 which is to build a 3rd host with sameish config hardware and add it to the cluster?

Thanks in advance im a vmware vet that wants to ditch broadcom thieves and replace it with Proxmox which is much more affordable but I need to end with a similar setup to be able to do the same job. My current vmware cluster has a vcenter with 2 nodes for the vm with vmotion and shared iscsi on a san like the 2 proxmox hosts i currently have in my lab.
 
1. a simple Vm running on a different cluster
Yes, you can do this. Subject to different cluster's stability, of course.
2. a simple server that will not host any vm but simple participate to vote
This is your best option. The vote does not need to host VMs , nor does it need access to storage.
3. can I have my 3rd server be the Proxmox Backup Server
You can do this as well. Its a variant of (2).

https://pve.proxmox.com/wiki/Cluste...External Vote Support,-This section describes

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I did a quick lab with virtualbox to test a 3 server cluster and the 3rd not hosting vm and I had a vm running on proxmox 2 and when I shutdown proxmox2 the vm was shutdown also and was lit back alsmot 1 minute later on proxmox1. is this because its a nested virtual enviroment or is it this slow to converge?
I was expecting a few seconds downtime max but alsmot a minute but worst it restarted the vm and not have it continue running whic his bad. Im thining im not confgured properly or I did not understand HA with proxmox, but im trying to acheive it that the vm's or containers stay online when a server fails and maybe lose a few packets but is migrated live to another host a bit like vmware does.

Because when I use the migrate feature I rarely lose more then 1 packets and its pretty seemless. But If I kill a host its takes a long time for it to react to the outage. But worst when it comes back the vm has rebooted. How do I avoid all that?
 
I did a quick lab with virtualbox to test a 3 server cluster and the 3rd not hosting vm and I had a vm running on proxmox 2 and when I shutdown proxmox2 the vm was shutdown also and was lit back alsmot 1 minute later on proxmox1. is this because its a nested virtual enviroment or is it this slow to converge?
The manual states that Proxmox "has typical error detection and failover times of about 2 minutes" : https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_ha_manager
I was expecting a few seconds downtime max but alsmot a minute but worst it restarted the vm and not have it continue running whic his bad.
Your expectations don't match Proxmox, sorry. Keeping the VM running is also impossible if the Proxmox node dies unexpectedly.

EDIT: If you (explicitly) migrated the VM (with shared storage) I would expect it to keep running.
 
Last edited:
PVE uses QEMU as its virtualization engine, and currently, QEMU does not support an ESX Fault Tolerance-like feature for live migration in the event of a host failure.

There is ongoing work in this area within the upstream QEMU project - specifically, COLO: https://wiki.qemu.org/Features/COLO. However, this feature is not yet integrated into PVE.

The 2-minute timeout can be unexpected for new users, but there is reasoning behind it. The system needs to determine whether the host is temporarily unavailable and might recover soon, in which case restarting VMs could be unnecessary and disruptive. Additionally, rebooting a failed host might be faster than reprovisioning all VMs.

There are trade-offs to consider in these scenarios.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: leesteken
ya I think my expectations were too high. But still I can live migrate a vm without losing a beat which is impressive for a free product.
Im gonna continue my lab but for sure im conidering going in prod with proxmox and getting it subscribed and all. Im impressed with it