Suggestions for low cost HA production setup in small company

VictorSTS · Nov 7, 2025

jb_wisemo said:
Skimming through that page, I am surprised there is no example using Linux kernel bridges for STP meshing,

There is RSTP [1]

jb_wisemo said:
Running a meshing or routing daemon just adds another point of failure.

Maybe, but it does allow to use both links simultaneously while on RTSP only one is in use and the other is fallback only.

jb_wisemo said:
Either way, that page requires an additional high speed NIC on each node to do the connections to the other neighbor node.

Which you should have anyway, connected to two switches with MLAG/stacking to avoid the network being an SPOF. But yes, you would need 4 nics per host, two for the MESH + two for the "lan".

[1] https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#RSTP_Loop_Setup

blanchet · Nov 7, 2025

In your case, the best solution to cut the costs while keeping the system easy to manage is the solution B:

2 PVE nodes with ZFS replication + QDevice

Notes:

There are many tutorials for this setup on the web.
You can run the corosync-qdevice on your PBS
Stay away from CEPH is you cannot afford to run 4 nodes.

david_tao · Nov 7, 2025

For limited budget consideration, the Option B: 2-node + QDevice with ZFS replication may is your low cost choose. but you need will designed the qdevice connectivity with 2-nodes to avoid if any network device down, let your 2-node + QDevice will stop communication with each other!
If you want higher available, then you may considering Option C: 3-node with Ceph, that will provide batter VM data consistent then Option B! Because you don't need to tolerate the ZFS replication caused data gap between replication interval.

VictorSTS · Nov 7, 2025

blanchet said:
Stay away from CEPH is you cannot afford to run 4 nodes.

Why?

Johannes S · Nov 7, 2025

blanchet said:
You can run the corosync-qdevice on your PBS

But then the backup server can be reached via ssh (without entering a password) from the cluster nodes. It's never a good thing that your backup can be accessed without authentification from the host you want to backup.

There is however a good way to work around this, described by @aaron here:

2x PVE nodes with local ZFS storage (same name)
1x PBS + PVE side by side bare metal.

The 2x PVE nodes are clustered. To be able to use HA I make sure that the VMs all have the Replication enabled. For Mailservers and other VMs where any data loss is painful, I replicate with the shortest possible interval of 1 minute. Other VMs, like a DNS server, are replicated with longer intervals.

On the PBS server I have one LXC container running which is providing the external part of the QDevice, so that the 2x PVE nodes get their 3rd vote and can handle HA and downtimes of one node.

Post in thread 'Planning advice'

Aug 13, 2025

What I do in some personal infra is the following:

2x PVE nodes with local ZFS storage (same name)
1x PBS + PVE side by side bare metal.

The 2x PVE nodes are clustered. To be able to use HA I make sure that the VMs all have the Replication enabled. For Mailservers and other VMs where any data loss is painful, I replicate with the shortest possible interval of 1 minute. Other VMs, like a DNS server, are replicated with longer intervals.

On the PBS server I have one LXC container running which is providing the external part of the QDevice, so that the 2x PVE nodes get their 3rd vote and...

aaron

Imho the ProxmoxVE documentation should be updated that such a setup (maybe with a VM for qdevice to have even stricter isolation) is considered best practice for small clusters.

alexskysilk · Nov 7, 2025

Johannes S said:
considered best practice for small clusters.

ZFS replication isnt a replacement for proper shared storage as it is asynchronous. a small cluster should be 2 real nodes, 1 quorum, and external shared storage (block or file)

Johannes S · Nov 7, 2025

alexskysilk said:
ZFS replication isnt a replacement for proper shared storage as it is asynchronous. a small cluster should be 2 real nodes, 1 quorum, and external shared storage (block or file)

That wasn't my point. My point was that it's not a good idea to allow ssh login as root on the backup server from the cluster. But if you add the PBS as qdevice this is what you will get. Thus I think that Aarons setup ( having both PBS+PVE baremetal on the Backup Host and use a vm or lxc as qdevice) should be considered as best practice if one want's to utilise the backup server as qdevice.

The choosen storage and it's pros or cons have nothing to do with it

UdoB · Nov 7, 2025

VictorSTS said:
Why?

Yes, there are enough users running with three nodes an be happy.

From my own experience in my homelab (last year) I learned that I want to be able to lose one node and have Cephs self healing capabilities do its job. To return to "everything's green" you need three nodes alive.

Again: I am not a Ceph expert and I tend to be a little paranoid. Some other findings are there: https://forum.proxmox.com/threads/fabu-can-i-use-ceph-in-a-_very_-small-cluster.159671/ -- with the result that I dropped Ceph and went back to ZFS only...

As always: YMMV

david_tao · Nov 8, 2025

UdoB said:
Yes, there are enough users running with three nodes an be happy.

From my own experience in my homelab (last year) I learned that I want to be able to lose one node and have Cephs self healing capabilities do its job. To return to "everything's green" you need three nodes alive.

Again: I am not a Ceph expert and I tend to be a little paranoid. Some other findings are there: https://forum.proxmox.com/threads/fabu-can-i-use-ceph-in-a-_very_-small-cluster.159671/ -- with the result that I dropped Ceph and went back to ZFS only...

As always: YMMV

I was reading above discuss published by [B]UdoB[/B], and agreed with his view point. Before use Ceph, may best to consider for these concerns:
1. Does this cluster carriers' mission-critical services? how many time is toloraced if ceph goes down?
2. Am I familiar to maintain Ceph? Do I have time to handle the deeper tasks on services Ceph?
3. How do I plan any secondary storage space if Ceph down and unable back to service ASAP? (Is it possible for some nodes provide ZFS pool with PBS backup for emergency plan)
4. Compare with above 3 points? is it really worth for department to choose Ceph instead to select a related simple ZFS either External NAS/iSCSI Storage to provide stable storage services.

gmaoret · Nov 8, 2025

I think the best solution for you is this (assuming you don't have mission-critical applications...): 2 PVE nodes with ZFS replication + QDevice

Of course, you'll need to consider ZFS's asynchronous replication, but if you manage replication times well for each VM, based on the data it contains, you can minimize the (potential) data loss you'll experience in the event of an unexpected catastrophic failure.
...remember, you're providing a rapid recovery service in the event of an unexpected catastrophic failure, so unless you have mission-critical applications, losing a few minutes of data probably won't matter...

Additionally (...and here I expose myself to criticism...), there's also the option of a "quasi-HA" configuration with just two ZFS nodes (without the qdevice...), but in this case, you expose yourself to the possibility of a split-brain in the event of a FULL DOWN failure.
This configuration is not recommended, but there are a couple of configuration tricks that can avoid a split-brain by preventing the system from automatically booting if a critical condition occurs, waiting for manual intervention by technical personnel who are familiar with managing unexpected failures and (manually) avoiding a split-brain.
If, in this situation, you can afford to wait for manual intervention, then this is a possible option...
We have some 2-node clusters without qdevices that have been working fine for years, and in cases of complete shutdowns (even accidental ones), a careful manual restart has always solved everything.

DISCLAIMER:
...I already know some will say that the configuration with just 2 nodes (without qdevices) is dangerous... I know, I pointed it out above, but it CAN work if you take this into account correctly and apply the right precautions.
Please don't insult me if you disagree...

Johannes S · Nov 8, 2025

gmaoret said:
...I already know some will say that the configuration with just 2 nodes (without qdevices) is dangerous... I know, I pointed it out above, but it CAN work if you take this into account correctly and apply the right precautions.

I doubt that somebody that ask questions like the OP has the needed knowledge and experience to do this. So this ill-advised proposal is not helpful at all if not outright dangerous. YMMV

gmaoret · Nov 8, 2025

Johannes S said:
I doubt that somebody that ask questions like the OP has the needed knowledge and experience to do this. So this ill-advised proposal is not helpful at all if not outtight dangerous. YMMV

You're probably right (or, more likely, completely right), but I thought it was right to add this too for completeness of information...

blanchet · Nov 12, 2025

VictorSTS said:
Why?

With 4 nodes, the cluster can still repair itself to a fully healthy state even after losing a node.
With 3 nodes, the cluster becomes directly “degraded” after losing a node.

Therefore the node maintenance (or any hardware issue) is much less risky with at least 4 CEPH nodes.

blanchet · Nov 12, 2025

If you have more budget, the best setup for a true minimal HA cluster that minimizes the cost while staying easy to manage is:

3 identical computer nodes
1 high-availability NFS shared storage
2 managed Ethernet switches

Personally, I like Dell PowerEdge and Netapp AFF A-Series servers, but any combination of hardware from top manufacturers should work.

The only drawback of this setup is that you can not use LXC because it does support snapshots over NFS.

Johannes S · Nov 12, 2025

blanchet said:
If you have more budget, the best setup for a true minimal HA cluster that minimizes the cost while staying easy to manage is:

3 identical computer nodes

1 high-availability NFS shared storage

2 managed Ethernet switches

Personally, I like Dell PowerEdge and Netapp AFF A-Series servers, but any combination of hardware from top manufacturers should work.

But such storage hardware also has it's costs. If you need to invest in storage hardware you could also invest in enough hardware to use Ceph, at least if you plan with three compute nodes or more. The story might be different if you happen to reuse existing hardware and your cluster just consists of two compute nodes+qdevice/PBS:

blanchet said:
The only drawback of this setup is that you can not use LXC because it does support snapshots over NFS.

You can still use LXCs without snapshots though. And backups with PBS doesn't need snapshot support on the storage level. Together with "change-detection-mode" set to "metadata" you can achieve a similiar functionality (fast backups before syste maintenance (updates)

PwrBank · Nov 12, 2025

We have a few remote warehouses that needed to have on-prem VMs with redundancy. What we do is have a 3 node cluster of Lenovo Tiny's with 2 switches, 2 battery backups, and offsite backups. Because there isn't a need for instant replication, we've opted to do a 10-15 minutes ZFS sync between each node for the VMs. The setup isn't perfect, mainly due to the single power supply per node, but it's a cheap and cost effective way to have "HA". If two more nodes were added, it would truly be HA. Then no matter what switch, battery backups, or node failed, it would always be up.

Typically we spec out M90Q machines which have plenty of cores, PCIe expansion, and lots of RAM. It allows a scale out approach vs a scale up. Plus you can afford to keep spares at that cost.

But again, these are warehouses, not things running databased or large file servers. If there is a 5 minutes gap in between syncs, that's okay in this case.

In your situation I'd go with a 3 node cluster with a shared storage backend. Unless you use something like LinStor with 1:1 replication between all the nodes I think you'll have a rough time with hyper-converged in such a small setup. Plus you'll want an onsite backup (PBS would be best for this) and if possible, replicate the backups off-site. This is assuming you are in an environment with people constantly using and changing the data the servers are providing. As long as you setup the network and storage correctly in this scenario, it'd be pretty performant as well.

Search

Search

Suggestions for low cost HA production setup in small company

VictorSTS

Distinguished Member

blanchet

New Member

david_tao

Active Member

VictorSTS

Distinguished Member

Johannes S

Distinguished Member

Post in thread 'Planning advice'

alexskysilk

Distinguished Member

Johannes S

Distinguished Member

UdoB

Distinguished Member

david_tao

Active Member

gmaoret

Renowned Member

Johannes S

Distinguished Member

gmaoret

Renowned Member

blanchet

New Member

blanchet

New Member

Johannes S

Distinguished Member

PwrBank

Active Member

We value your privacy