Suggestions for low cost HA production setup in small company

jb_wisemo

New Member
Oct 30, 2025
2
0
1
Denmark
I am looking to design a production HA Proxmox VE cluster in a small company. So costs are the major constraint. Cluster will run production VMs (including DNS/DHCP) and development VMs. Cluster will use manually assigned IPs

Setups I am considering so far, if I missed something, please suggest that, but consider the cost limitation.

Option A: 2-node + QDevice with Ceph
  • The 2 main nodes will be new servers with lots of RAM and threads, plus a few hot-plug disks, separate high speed (10GbE) between nodes. QDevice maybe on a slower net.
  • Provides automatic failover of compute and storage
  • Near instant replication of virtual disks as writes are passed through to the replica on the other node
  • Unclear how to set up a QDevice for both Proxmox itself and Ceph by just installing software on a less powerful Debian node
  • Rumors that Ceph will insist on keeping 3 copies of everything instead of the 2 in RAID 1
  • Rumors that Ceph will do massive amounts of unneeded data copying on the other node when one of the nodes is taken offline
  • Unclear how this deals with a full power outage taking out both nodes at near the same time as the UPS runs empty.
Option B: 2-node + QDevice with ZFS replication
  • Same hardware as option A
  • Provides automatic failover of compute, maybe storage too
  • Delayed replication of virtual disks, causing failed over VMs to revert to older data.
  • QDevice apparently needs only deal with Proxmox coresync, no extra work for ZFS Replication.
  • Unclear if ZFS replication keeps 2 or 4 copies of data (1 or 2 per node).
  • Hopefully will have less complicated reaction to a full power outage taking out both nodes at near the same time as the UPS runs empty.
Option C: 3-node with Ceph
  • Similar hardware to the 2-node options but with less memory and higher total cost.
  • Provides automatic failover of computer and storage
  • Near instant replication of virtual disks as writes are passed through to the replica on other nodes
  • Rumors that Ceph will insist on keeping 3 copies of everything instead of the 2 in RAID 1
  • Rumors that Ceph will do massive amounts of unneeded data copying on the other nodes when one of the nodes is taken offline
  • Unclear how this deals with a full power outage taking out all/most nodes at near the same time as the UPS runs empty.
  • More expensive due to the extra node and need for a 10GbE switch to connect 3 nodes on each backend net.
Option D: 3-node with ZFS replication
  • Same hardware as option C
  • Provides automatic failover of compute, maybe storage too
  • Delayed replication of virtual disks, causing failed over VMs to revert to older data.
  • Unclear if ZFS replication keeps 2 or 4 copies of data (1 or 2 per node).
  • Hopefully will have less complicated reaction to a full power outage taking out all/most nodes at near the same time as the UPS runs empty.
  • More expensive due to the extra node and need for a 10GbE switch to connect 3 nodes on each backend net.
Option E: 2-node + QDevice with other clustered iSCSI SAN
  • Same hardware as option A/B, but without the hot-plug disks Pplus a 3rd party HA SAN storage solution and hardware.
  • Provides automatic failover of compute, shared access to HA storage via SCSI locking on SAN or Proxmox coordination of access.
  • Maybe the QDevice can run on the SAN hardware, maybe on some other Debian server.
  • QDevice apparently needs only deal with Proxmox coresync, no extra work for NAS HA.
  • Hopefully will have less complicated reaction to a full power outage taking out both nodes at near the same time as the UPS runs empty.
  • More expensive due to the extra SAN solution and potential need for a 10GbE switch to connect 2 nodes to SAN.
For the Proxmox nodes I would consider low cost new 1U servers with identical CPU/RAM setup and some kind of IPMI/BMC feature.
 
If you want high-availability by redundancy then going for the bare-minimum of said redundancy does not make sense to me.
Maybe run a single PVE (and maybe one stand-by PVE separately, using PDM to migrate between them if necessary) and a PBS with hourly backups instead.
 
If you want high-availability by redundancy then going for the bare-minimum of said redundancy does not make sense to me.
Maybe run a single PVE (and maybe one stand-by PVE separately, using PDM to migrate between them if necessary) and a PBS with hourly backups instead.
HA is always a matter of money versus probability. At one extreme, someone could spend 1 billion $ on redundant capacity to obtain 99.999999999999% uptime or more. Or one could spend $10 to obtain 90% uptime or less . In practice the best choice will be somewhere in between. Goal here is to cause the VMs to automatically fail over to new hardware with the same replicated data within 3 minutes of hardware failure. Thus the basic goal is for that failover hardware to exist, and for the Proxmox VE software to do the failover and replication . Backup is kept as a separate issue and not the subject of this discussion.

Theoretically, having compute capacity to run all the VMs on all but one server would cover compute uptime, while having a copy of data on all but one server or all but one SAN node will provide the storage uptime to run those VMs . In practice however software limitations in Proxmox will limit its ability to do the job. For example, it may fail to keep up to date copies of data on enough nodes or keep too many copies on the same nodes.

If a combined compute/storage node goes down as a unit, the copies of data stored on disks of that node will be temporarily inaccessible, but will generally still exist as a redundant copy that will rejoin the cluster when the node does so. On the other hand if a physical disk device (PV in lvm terminology) dies, the redundant data on that disk is typically lost except for difficult offline recovery techniques. In terms of storage uptime it matters how (un)likely it is for a node and a disk on that node to die simultaneously, and how (un)likely it is for a node and the disk elsewhere holding a redundant data copy to die simultaneously . Either way, if a running VM writes a change to its virtual disk when the cluster is in a degraded state, storage redundancy would require those changes to be stored on two still online disks. Conversely if a running VM reads from a virtual disk location where no copy is available, that VM will have to freeze or fail until a copy comes online again. Knowing which of these scenarios will work in practice is higly specific to Proxmox as opposed to ideal considerations of what an ideal software suite would do.
 
Last edited:
Your one must give more information. How many VMs, how much RAM, need high speed storage, how much storage space, how many user aso.?
 
  • Like
Reactions: alexskysilk
HA is always a matter of money versus probability. At one extreme, someone could spend 1 billion $ on redundant capacity to obtain 99.999999999999% uptime or more. Or one could spend $10 to obtain 90% uptime or less
This, while true, is the wrong perspective. what is the CONSEQUENCE of downtime? put a cost on that, and you have an economic baseline.

Its one thing if your massive ecommerce platform is out. its another if you cant access your emails for a few hours. If you are designing a solution, the first order of business is to understand what the redlines are- business impact of outage is one.
In practice however software limitations in Proxmox will limit its ability to do the job.
no idea what you're trying to say. All you're pointing out are the fundamentals of HA, not what your customer's needs are. As you mentioned above, increasing uptime increases cost exponentially. you need to put a pin where you are meeting customer requirement, and design for that.

Without knowing what use/load is being designed for, I'd say that a "small" business would be fine with option B (which should be the lowest cost and complexity.) The only thing I'd probably do differently is have proper external shared storage instead of zfs.