Looking at ESXi alternatives....architecture questions

bishoptf

New Member
Jun 7, 2025
2
0
1
I know this is a pretty common theme these days with Broadcom just, well being who they are and wanting to plunder their user base. I support some very small companies, mostly non profits and VMware had a very generous pricing which was basically education that was really cheap for essentials, not essentials plus just essentials. But that is no more and the standard pricing from Broadcom well its a lot more expensive and not perpetual so looking at alternatives. Funny but all the alternatives including proxmox paid subscription is way more than the old VMware pricing.

Some background, will try to keep this short, I'm a linux guy for some time, so I understand what is being offered with prox but also looking at xcp-ng and some other options. I originally went with VMware because of cost and the if I get hit by a bus it would be easy for someone to come into and support. Currently we have 2 hosts running various OS, windows server and linux for the most part. Since we have essential we do not have vmotion but we have veeam that we use to replicate between hosts, basically each host replicates to the alternative host and then does nightly backups. Each host is in a different closet and each server has dual power supplies, and ups. For critical services like domain controllers, dhcp, dns etc, each host has a running server so most services are redundant if one closest goes down. We have one file share that is not duplicated and that is something that I would want to be able to bring up on the other host if needed. Both hosts are running raid1 for the OS (ESXi) and raid one for the storage, currently hardware raid but that can be changed to jbod if needed, but basically each host has local storage.

I have proxmox running in my lab and seems straight forward for the most part except for the cluster side of things. I understand that you need at least 3 nodes for a quorum or 2 hosts plus a qdevice. I already have a vm hosted on my nas configured as a qdevice but here is my main question. I do not need HA capabilities, again most of my services are already redundant but I would prefer to have a single pane of glass to manage both hosts. From reading I understand that you do not want 2 devices to go down but not sure if that is an issue if you are not running HA for the cluster. My concern and this is what I would like to know is that there are failure situations where I would lose both the qdevice and one of the hosts. Both devices are in the same rack and are served by they same power and switch, if I loose that one switch both the host and qdevice would be down. I know its unlikely to happen but we have had lightening strikes take out network switches before so I do know that while rare it can happen.

What I would like to know is if this happens would I be able to start a VM up on the remaining host if the other host and qdevice are not reachable. I have read mix posting regarding this with some referring to HA and non HA but for me this would be a non HA cluster.

Here is what I am thinking of having, 2 host cluster + 1 qdevice, both running zfs raid1 for OS and another zfs raid1 for storage named the same on both hosts. Both hosts would be added to a cluster along with a standalone qdevice without HA. I would enable zfs replication have the vm's replicate to each opposite host. I think I understand how this would work and understand having the same storage names etc but what I am not sure about is in the rare outage where both a node and qdevice are down would I be able to start up the replicated VM to make it available?

I know I can lab it up but wanted to ask and make sure I understand how it should work in this configuration. The other option would be to run 2 separate nodes with replication and then I just have to remember which vm's are on which host etc, seems like a step back from VMware but just trying to see what my options are.

Thanks :)
 
Funny but all the alternatives including proxmox paid subscription is way more than the old VMware pricing.
We had that with one customer, too.

I do not need HA capabilities, again most of my services are already redundant but I would prefer to have a single pane of glass to manage both hosts.
You may look into the Proxmox Datacenter Manager, which bridges that gap of administrating (at least two) independent PVE systems.

What I would like to know is if this happens would I be able to start a VM up on the remaining host if the other host and qdevice are not reachable. I have read mix posting regarding this with some referring to HA and non HA but for me this would be a non HA cluster.
No unless you explicitly set the expected votes to 1, which is more like a hack. I would not recommend to use a cluster in your case.

Here is what I am thinking of having, 2 host cluster + 1 qdevice, both running zfs raid1 for OS and another zfs raid1 for storage named the same on both hosts. Both hosts would be added to a cluster along with a standalone qdevice without HA. I would enable zfs replication have the vm's replicate to each opposite host. I think I understand how this would work and understand having the same storage names etc but what I am not sure about is in the rare outage where both a node and qdevice are down would I be able to start up the replicated VM to make it available?
a good setup, but without ha. The three nodes should not have intersecting failure domains, so the qdevice has to be independet of the nodes if you want a working solution.

The other option would be to run 2 separate nodes with replication and then I just have to remember which vm's are on which host etc, seems like a step back from VMware but just trying to see what my options are.
Proxmox Datacenter Mananger will help in this situtation, but AFAIK not solve the HA problem.
 
From reading I understand that you do not want 2 devices to go down but not sure if that is an issue if you are not running HA for the cluster
High availability is a function of a failure tolerance within a given failure domain. If you want/need a host failure domain that can sustain two node failures just size for it- you'd need 5 nodes and shared storage. If your tolerance is a single node (which is pretty typical) your suggested layout works just fine.
My concern and this is what I would like to know is that there are failure situations where I would lose both the qdevice and one of the hosts.
A qdevice is, for the purposes of this discussion, a node. While it would be possible to bring up a survivor by using votes=1 as @LnxBil suggested, this isnt a normally applied solution to a design you are suggesting.
Both devices are in the same rack and are served by they same power and switch, if I loose that one switch both the host and qdevice would be down.
Same as above. Power is a failure domain you should account for it within your design. These concepts are universal and apply regardless of software solution you employ.
would I be able to start a VM up on the remaining host if the other host and qdevice are not reachable.
If you have a cluster, any node not being able to reach the other members of the cluster will automatically fence (shut down) itself. If you dont, there is no awareness between nodes at that level.
Both hosts would be added to a cluster along with a standalone qdevice
a good setup, but without ha
Why not? as long as the workload fits on one node this setup is just fine for ha.
The other option would be to run 2 separate nodes with replication and then I just have to remember which vm's are on which host etc
All you need is to add a qdevice and you dont need to remember anything ;)
 
  • Like
Reactions: bishoptf
Why not? as long as the workload fits on one node this setup is just fine for ha.
As long as the qdevice is in another failure domain you're absolutely right, but it is not according to the OP:
Both devices are in the same rack and are served by they same power and switch, if I loose that one switch both the host and qdevice would be down.
 
Thanks for the replies, again I do not need or want HA but I would like a single pain of glass and try to keep something close to what we have currently. SMB do not have data centers etc, I have to data closets that have a server in each. Currently we are running esxi with veeam providing replication between each host, no vmotion etc, just hourly replication so if we were to lose one switch which is the biggest single point of failure then I could bring up the VM's on the other host, the replicas are sitting there ready to go. Would they lose some data, possible since its only replicating every hour but again, its a small business that and the rate of churn is pretty low so an hour loss is not a big deal. The 3 node cluster really doesnt seem to be a decent fit for many small business since they are going to have many single point of failure points, there is just no way to design around losing a single network switch that could take down 2 nodes, again most small businesses do not have data centers and do not really need HA. However having one pane of glass to be able to manage your hosts and vm's and having the ability to replicate between hosts would be beneficial, again I do not need HA to automatically spin stuff up, I just need to be able to bring up a replica VM if we have a hardware issue, most notably a switch failure so they can continue be able to work.

I will take a look at Proxmox datacenter and see what it provides, I am just trying to get close to what we are currently doing with 2 esxi servers and vCenter and Veeam.

One question, if I have 2 prox nodes running local zfs storage not in a cluster am I able to replicate from one node to another node? I know the storage names need to be the same on both nodes but are you only able to do that if you are running a cluster?

Thanks again for the input...