High availability experiments / questions

stevenwh

Member
Mar 16, 2024
31
2
8
Hello all,
I'm playing around with Proxmox and wanted to experiment with some high availability services. This is in my homelab and I know it is complete overkill for the things I'm running / doing, but it's fun for me to play around with and learn things (one of the points of having a homelab!).

A little bit about my setup, I've got a Proxmox server running at home with some various self hosted services (bitwarden, imich, home assistance, etc.). I'd like to play around with making some of the high availability. I also have another offsite box that is primarily an offsite backup solution, but I put Proxmox on it as well so that I could run some VMs offsite if wanted to. I have a PBS VM on both boxes that syncs to each other.

Currently, I do have the ability that if my main server was down and bitwarden inaccessible for example, I could access the offsite box and restore the bitwarden container. But then I'd either also have to have a secondary dns entry to hit that, or update the primary dns entry. And I also have to make sure that any changes made while using the secondary are synced back to the primary appropriately. Makes having this kind of stuff automated a bit more tricky (not impossible though).

My first thought was, oh Proxmox has clustering and high availability built in. I wonder if that would work better. And then I discovered clustering doesn't work if there is more than 5ms latency (so pretty much never outside of a single datacenter). I must say I was surprised by this limitation. I understand it's a technical limitation, but I'm just really surprised no one has found a way to overcome it yet. It really makes me wonder how companies can offer 99.999% and higher uptime guarantees without having multiple datacenters at varying locations with automatic fail overs and all.

Given this, my main question is how is it done? lol. This is definitely outside the scope of my experience and knowledge and I want to learn how to do it even if it is complete overkill for my usage. I don't want to simply have a backup spin up and figure out a way to sync the data back and all of that. I want a true failover system between multiple sites where a < 864 millisecond per day downtime could actually be achieved.

What technology / software etc would I look into for doing something like this? The best thing I can come up with atm on my own is some kind of load balancer that can automatically redirect traffic if the primary data center is down, but you'd still be left with the problem of having to sync data back to the primary before directing traffic there again after an outage. And that load balancer would also need a backup... which again I'm not sure how is possible. Thinking of web accessible things like bitwarden, I don't know what you would even do to make it able to either update the dns quickly enough, or be able to have multiple destinations for a single dns entry.. Does all high availability systems at some point have a single point of failure somewhere that is just really stable?? lol I can't imagine that is the case...

Thanks for any feedback!
 
There are many levels...in one location, redundant power, switches, etc.

At the application level, databases can cluster as well, so one could have multiple web servers on the front end and multiple database servers on the back end.

When you visit, say, Reddit, does it matter if you see all the posts from the last 5 seconds, or just most of them?

DNS for example can have multiple servers for an IP: https://www.cloudflare.com/learning/dns/what-is-anycast-dns/, https://quad9.net/service/locations/.

At a high level Proxmox has their Datacenter Manager now which can sort of "link" two clusters.
 
  • Like
Reactions: UdoB