Dual-Server Remote Field Installation

jmbldwn

New Member
Mar 11, 2025
3
0
1
I am considering using Proxmox VE to provide redundancy and failover for a set of services I am running in a remote field installation.

The idea is to use two identical servers in a cluster, each with enough capacity to run all of my services, in a configuration where if one server fails the other will take on the full load until the dead server can be replaced, and I am ok with the temporary loss of redundancy during that time.

Ideally, the two servers will be all of the hardware I need at each location. I'll make sure I have enough RAM/CPU/SSD for everything I'm running.

My research into Proxmox so far has led me to a few questions that I could use some help with:

Storage: By default the storage is per-server. I want to have a simple way for a small file system that holds images, scripts, etc. to be replicated across both servers in the cluster. My read of CEPH is that it might be overkill for this. There's also a built-in cluster file system Proxmox uses, but it doesn't look like I can use this for my data. What's the simplest way to achieve this?

High Availability: The docs imply that I need 3 voting servers to be quorate. I don't want to have a 3rd server of any kind if I can avoid it. Is there any reason why a dual-server cluster wouldn't fail over correctly if one server died or is taken offline?

Any other considerations?
 
Hi,

Storage: By default the storage is per-server. I want to have a simple way for a small file system that holds images, scripts, etc. to be replicated across both servers in the cluster. My read of CEPH is that it might be overkill for this. There's also a built-in cluster file system Proxmox uses, but it doesn't look like I can use this for my data. What's the simplest way to achieve this?
Probably by using ZFS with Storage replication. The interval can be turned down to 1 minute, which - while some data loss might occur - minimizes the period. Ceph also needs at least 3 nodes, which even than it isn't an optimal setup for HA.

High Availability: The docs imply that I need 3 voting servers to be quorate. I don't want to have a 3rd server of any kind if I can avoid it. Is there any reason why a dual-server cluster wouldn't fail over correctly if one server died or is taken offline?
Clustering needs at least 3 nodes, split brain is a key word here. See also Cluster requirements, although you can use a QDevice to get around that.
 
Thanks, @cheiss, very helpful.

Can I run the QDevice on the same hardware? I would put a QDevice on each server, so I'd have two servers and two QDevices in the cluster. This would mean if I have a server failure, I drop to 1 server and 1 QDevice until the server is fixed/replaced.

Any issues with that?
 
I would put a QDevice on each server, so I'd have two servers and two QDevices in the cluster.
That wouldn't change the situation and make much sense. You'd effectively still have 2 nodes with 2 votes each => same problem. Also, one normally wants to have an uneven number of nodes/votes in a cluster - see Supported Setups, which explains all that in detail.

The QDevice must be external to be effective. It's a very simple, cheap service and can run on any kind of hardware - it just needs a connection to the cluster nodes. E.g. many people run it on some cheap SBC or even their router.
 
Yeah, I realize it's not ideal for majority voting, but all I'm trying to accomplish with Proxmox is to have a hot standby. Having a 3rd device may help with the majority voting but it also adds another component I have to deploy and maintain at the remote location.

So I understand the cost/benefit here, what's the worst that could happen if I deploy 2 servers in a cluster without at 3rd voter?
 
So I understand the cost/benefit here, what's the worst that could happen if I deploy 2 servers in a cluster without at 3rd voter?
TLDR: Useless.

In layman's terms, when one of the two servers goes "down" the remaining one, has no idea why he can't reach the other one; is it because the other one is down & at fault, or maybe it is his own fault since "he is having a NW issue" in reaching the other one. This means he will "fence" himself as a form of protection, so he is also not going to be exactly usable. When you add a third device, the voting makes sure who is the "odd man out".

Hope that helps - somewhat.

It shouldn't be too much to add a Qdevice - even in the field.
 
Last edited:
  • Like
Reactions: UdoB
Having a 3rd device may help with the majority voting but it also adds another component I have to deploy and maintain at the remote location.
It actually does not have to be in the same physical location, e.g. running it somewhere else over the internet via VPN would be fine too - the QDevice is not that latency-sensitive as the actual Proxmox VE nodes.

So I understand the cost/benefit here, what's the worst that could happen if I deploy 2 servers in a cluster without at 3rd voter?
As soon as one node goes down, there is no quorum anymore in the cluster - since the second node (aka. the remaining cluster) now only has 50% of the votes - which is not the majority. At this point, the cluster basically goes readonly, at least the management plane, to avoid any data corruption.

Having high availability - which you want - is simply not possible with two nodes.
 
  • Like
Reactions: UdoB