[SOLVED] High availability with 2 VMs

nett_hier · May 12, 2023

I want a highly available OPNsense setup in my Proxmox cluster. OPNsense has native HA features, but they require some manual tweaking to set up and I don't want to run a separate OPNsense VM for each node in my cluster.
So I had the idea of just setting up two VMs on separate nodes and have those configured with OPNsense's own HA, meaning if the primary one goes down the fallback automatically starts working and so on.
I now want to integrate this with Proxmox HA such that should a node with one of those VMs go down, that VM will be restarted on a different node, i.e. regular HA.
However, I don't want Proxmox to start the new instance on the same cluster node as where the other OPNsense VM is currently running for obvious reasons.
Is there any way to achieve this? I.e. configure HA for 2 separate VMs such that they avoid each starting on the same node?

bbgeek17 · May 12, 2023

https://pve.proxmox.com/wiki/High_Availability

Code:

Groups
The HA group configuration file /etc/pve/ha/groups.cfg is used to define groups of cluster
nodes. A resource can be restricted to run only on the members of such group. A group
configuration look like this:

group: <group>
       nodes <node_list>
       <property> <value>

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

nett_hier · May 12, 2023

Could you maybe elaborate a bit on the solution you're proposing?
If I set up a single group containing the whole cluster and assign both VMs to it, I would assume that the VMs would still happily start on the same node.
If I set up two disjoint groups with 'restricted 0' and assign each one VM, it would kind of work until one of the groups is completely offline, in which case the VM would have to switch to the other group and would possibly start on a node where the other VM is currently running.

EDIT: Just found this line in the docs you linked: "If there are more nodes in the highest priority class, the services will get distributed to those nodes.". So does this mean that the first assumption I made is incorrect? Will VMs assigned to the same group generally prefer to spread?

floh8 · May 12, 2023

u build a group of node 1 +3 and a second group of node 2 +3. Then add the vms to one of the groups. U can do it in the gui as well.

nett_hier · May 12, 2023

How would that work with more than 3 nodes though? Wouldn't the issue I described in the second case with two disjoint groups occur?

Dunuin · May 12, 2023

What about node 1+3+5+7+... for OPNsense VM A and node 2+4+6+8+... for OPNsense VM B? Then a lot of nodes would need to fail at the same time.

nett_hier · May 12, 2023

Hm, thinking about it the cluster quorum would fail before that issue would ever happen lol.
Since it's always an odd number of nodes I'd set it up like @floh8 described it, i.e. with (n-1)/2 nodes in each group with the odd one out being shared between both groups with a low priority.

bbgeek17 · May 12, 2023

Just keep in mind that double clustering without each one aware of what the other one is doing can lead to unpredictable results.

If node1/vm1 fails and starts transferring to node2/vm1, but app level HA moves the resources to node3/vm2, will vm1 (which may have state saying its primary) and vm2, which has moved on transaction wise, be able to duke it out? May be.
A lot in this HA scenario will depend on timing/time outs.
I am not familiar with what opsense does for HA, but adding some sort of quorum on 3rd node may be required for stable operations.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

nett_hier · May 12, 2023

Thanks for the heads up.
I think OPNsense should be able to handle it, but I'll experiment with failure scenarios and see what happens.

floh8 · May 12, 2023

normally u dont need such double ha solution. application ha is already better and should be enough. A node failure is very rare and when it really happens then you have to take care as good admin that this node will be go live again in a short time. Even i think in small enviroments its enough to use the virtualization ha and save the expense for a second vm.

nett_hier · May 12, 2023

Eh, you're probably right, especially at the current cluster size. But since Proxmox HA seems simple enough to set up I'd at least like to try if it works cleanly, in case I ever scale to a point in the future where I need to be able to tolerate two node failures.

nett_hier · May 12, 2023

floh8 said:
normally u dont need such double ha solution. application ha is already better and should be enough. A node failure is very rare and when it really happens then you have to take care as good admin that this node will be go live again in a short time. Even i think in small enviroments its enough to use the virtualization ha and save the expense for a second vm.

Regarding your edit, by virtualization HA you mean Proxmox HA? The issue with that would be that in the event of a failover the in-memory state would be lost, meaning all active connections would be dropped, which I don't want.
OPNsense's built-in HA includes pfsync meaning the state stays synchronized and a failover is seamless.

floh8 · May 12, 2023

nett_hier said:
Regarding your edit, by virtualization HA you mean Proxmox HA?

yes

nett_hier said:
OPNsense's built-in HA includes pfsync meaning the state stays synchronized and a failover is seamless.

I know thats why I post:

floh8 said:
application ha is already better and should be enough.

I already implemented it on VMWare Cluster with Sophos.

[SOLVED] High availability with 2 VMs

nett_hier

Member

bbgeek17

Distinguished Member

nett_hier

Member

floh8

Renowned Member

nett_hier

Member

Dunuin

Distinguished Member

nett_hier

Member

bbgeek17

Distinguished Member

nett_hier

Member

floh8

Renowned Member

nett_hier

Member

nett_hier

Member

floh8

Renowned Member

We value your privacy