How to start with HA and replication in an existing environment - understanding concepts

shanoviachan

Member
Oct 6, 2021
11
1
8
36
Hi guys
i am using proxmox already for some while, but since i had only one physically node available for some time, i have always relied on the automated backup snapshotting. In case a container failed, i have recreated a backup. backups are stored in the cloud for extra security.

Now i have a second node, which currently gets snapshots synced via rsync. recreating them manually in case of failure of node 1 works. but when learning about HA and replication, i obviously now aiming for automating the process (especially for critical containers like home assistant or pfsense).

Now i wonder how to best migrate to a HA setup with two nodes and a third virtual quorum (eg on a raspb).

Both my nodes have the classic setup of local, local-lvm and a NAS named ZFS drive used for my OMV installation. I get myself confused about the needed steps to migrate into a cluster and activate the replication.

- Storage: For CT or LXC that do not use external space (thus only use local-lvm), do i need a shared storage on both nodes? If yes for what purpose?
- Storage: For CT or LXC such as OMV which use external space (here my NAS ZFS), is there a way to build something based of the existing setup, thus not have to delete and recreate the storage?
- Firewall: Kind of exclusively in this tutorial (https://www.wundertech.net/how-to-set-up-a-cluster-in-proxmox/) i have found a step on setting up the firewall. Is that really needed if both nodes are in the same network (here above the pfsense)

In all manuals i found, they create a storage on a third party system (eg a true NAS device) which assumes that the NAS never fails. So i am wondering whether we can't have a storage on both nodes (lets say a 2TB) that is kept in sync and if one node fails, the other node comes up with the CT/LXC and also using its data locally.


I really hope to find someone that is eager to explain the concepts in simple terms rather than sharing links to the documentation or other threads that seem to require some more background knowledge ^^

Best regards
 

I have been just testing HA around myself lately, so I better not advise on everything, but a few remarks I can make from my observations so far:

Now i have a second node, which currently gets snapshots synced via rsync. recreating them manually in case of failure of node 1 works. but when learning about HA and replication, i obviously now aiming for automating the process (especially for critical containers like home assistant or pfsense).

Now i wonder how to best migrate to a HA setup with two nodes and a third virtual quorum (eg on a raspb).

Both my nodes have the classic setup of local, local-lvm and a NAS named ZFS drive used for my OMV installation. I get myself confused about the needed steps to migrate into a cluster and activate the replication.

You might end up reverting back to just the backups. It's good to test it out, replication alone is not an issue, the HA may not exactly do what you expected it to. But you basically just need the volume for that VM to be HA-migrated around to be available. Whether replicated (only works with ZFS as it uses its snapshots to only send over deltas) or shared storage.

- Storage: For CT or LXC that do not use external space (thus only use local-lvm), do i need a shared storage on both nodes? If yes for what purpose?

They will need to go on ZFS. Or shared storage.

- Storage: For CT or LXC such as OMV which use external space (here my NAS ZFS), is there a way to build something based of the existing setup, thus not have to delete and recreate the storage?

They will work just fine.

- Firewall: Kind of exclusively in this tutorial (https://www.wundertech.net/how-to-set-up-a-cluster-in-proxmox/) i have found a step on setting up the firewall. Is that really needed if both nodes are in the same network (here above the pfsense)

I find this decision to be unrelated to clustering or HA altogether.

In all manuals i found, they create a storage on a third party system (eg a true NAS device) which assumes that the NAS never fails.

Which I find also silly, what's the point of something "HA" and then introducing SPOF in the form of one NAS device. Either it has to be something redundant or distributed. Or ...

So i am wondering whether we can't have a storage on both nodes (lets say a 2TB) that is kept in sync and if one node fails, the other node comes up with the CT/LXC and also using its data locally.

The replicas, for PVE that needs ZFS. I find the replication working well, actually. For 2 nodes anything other makes less sense, single NAS defeats the purpose of HA illusion.

I really hope to find someone that is eager to explain the concepts in simple terms rather than sharing links to the documentation or other threads that seem to require some more background knowledge ^^

The main concept is that when a node goes down and it had VM/CTs that are meant to be HA, they are auto-migrated to some other node to run on. Since the original node is down, you can't migrate the volumes out of it, you either had it done beforehand (replicas) or they are available outside the cluster (shared storage) or "within" the cluster like CEPH (do not even bother with 2/3 nodes).
 
Last edited:
obligatory: HA IS NOT BACKUP. HA=high availability, which is to say the SERVICE will remain functioning on infrastructural failure. This necessarily means that within the scope of the desired domain, all hardware/infrastructure needs to either be replicated (redundant) or have some other form of fault tolerance.

Now i have a second node, which currently gets snapshots synced via rsync. recreating them manually in case of failure of node 1 works. but when learning about HA and replication, i obviously now aiming for automating the process (especially for critical containers like home assistant or pfsense).
routers have a different scope then other services, as a router outage affects the entirety of your infrastructure. I would advise moving the router OFF your virtual infrastructure, but barring that, instead of trying to migrate on failure just have two router instances in a vrrp couple. you can find documentation here: https://docs.netgate.com/pfsense/en/latest/highavailability/index.html

proxmox does support a scheduled replication of zfs pools. since you are apparently ok with asynchronous replication, this may serve to do what you ask. see https://pve.proxmox.com/wiki/Storage_Replication

I really hope to find someone that is eager to explain the concepts in simple terms rather than sharing links to the documentation or other threads that seem to require some more background knowledge
you are trying to operate systems of some sophistication. If you don't endeavor to understand what you're doing, you cant expect good results. the "background knowledge" you're referring to is the building blocks- its not really possible to explain calculus without understanding arithmetic.
 
you are trying to operate systems of some sophistication. If you don't endeavor to understand what you're doing, you cant expect good results. the "background knowledge" you're referring to is the building blocks- its not really possible to explain calculus without understanding arithmetic.

Some people do not really learn well from "the docs", it's also that the docs are more like reference guide, they do not stand in for the actual experience of "doing the thing" and have it fail on you, hands-on. The docs are building blocks that still leave you with many bricks missing, even if they were the best possible. The VRRP is a perfect example of such thing, good piece of advice, but it may become "fun" to troubleshoot. Calculus can be explained without arithmetic, with geometry too. :)
 
Some people do not really learn well from "the docs", it's also that the docs are more like reference guide, they do not stand in for the actual experience of "doing the thing" and have it fail on you, hands-on.
this applies to all people. you either learn, or pay a professional.
Calculus can be explained without arithmetic, with geometry too. :)
If you say so. I am unable to imagine this.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!