Poor man's redundancy options

chrisp250

New Member
Dec 16, 2022
14
2
3
Hi all,
I set up a cluster with 2 nodes running Proxmox 8.2 + 1 qdevice.
I have a VM that provides critical network services (DNS, proxy server, etc). This VM rarely changes, so live replication is not required.

What I'm looking for:
A way to have this VM shutdown, but available to start up automatically on the back node, if the primary device fails. I was able to back it up, ship it across to the back up node and restore it there, but what I'm missing is a way to spin it up if the primary node dies. I can't have it live because it needs to have the same MAC and IP address as the original.

I had a look at running Ceph and ZFS and they both seem to require a fair bit of resources for my humble hardware, specially Ceph. I also looked at a NFS solution, but that would require more highly available hardware to implement.

In summary, I have a network appliance type of VM running on the primary node, and a copy in the back up node. I need a mechanism to spin it up if the primary vanishes. Even better if for whatever reason the primary node comes back up, it can detect the backup is running, it doesn't spin up the primary.

Thank you
 
You simply can't have redundancy and minimal hardware at the same time, and you definitely don't have enough nodes for Ceph. You could use a remote shared storage but that requires more hardware and creates a single point of failure. I think what you want is ZFS with replication and HA for that VM. If one node fails, the VM is (freshly) started on the other node after a minute or so (I believe).
 
First, what you want is COLO and it is not yet available for production use in QEMU.

I'd go with a clustered service approach. There are a lot of solutions for this. Run one VM on each node and use e.g. heartbeat with an HA-IP and plug the services in there. In case of an error the services are failed over to the already running node and you're fine.
 
I have a VM that provides critical network services (DNS, proxy server, etc). This VM rarely changes, so live replication is not required.
DNS can (and should) be set up for redundant servers. once you have that, you can set up dns A to relay server A IP address(es) and dns B to relay server B IPs addresses- poor man's dns load balancing with failover.
 
Thank you everyone.
I thought this would be more straight forward. I'm thinking I'll have a look at keepalived.
 
I set up a cluster with 2 nodes running Proxmox 8.2 + 1 qdevice.
[...]
What I'm looking for:
A way to have this VM shutdown, but available to start up automatically on the back node, if the primary device fails.
You are almost there.
The desired setup to achieve HA is:
1. Have a ZFS pool with the same name in both servers
2. Setup a cluster between nodes + qdevice (you've already done that)
3. Setup VM replication from your primary node -> secondary (per-vm configuration)
4. Configure HA for the desired VM in "HA" panel and set the desired state as "started"

This way, your VM will replicate every xx minutes between nodes;
If the primary node fails, the secondary node will spin up that VM;

Note: In the best scenario, it's desired to have the same hardware in both nodes in order to have seamless live migration between nodes.

Refer:
https://pve.proxmox.com/wiki/High_Availability
 
Last edited:
You are almost there.
The desired setup to achieve HA is:
1. Have a ZFS pool with the same name in both servers
2. Setup a cluster between nodes + qdevice (you've already done that)
3. Setup VM replication from your primary node -> secondary (per-vm configuration)
4. Configure HA for the desired VM in "HA" panel and set the desired state as "started"

This way, your VM will replicate every xx minutes between nodes;
If the primary node fails, the secondary node will spin up that VM;

Note: In the best scenario, it's desired to have the same hardware in both nodes in order to have seamless live migration between nodes.

Refer:
https://pve.proxmox.com/wiki/High_Availability
Hola Santiago,
I know that with ZFS you can achieve HA, but when I read the requirements for ZFS I thought it would be very taxing on my infrastructure:
- I only have a 1gb network between the two Proxmox nodes
- Each node has total 32gb RAM and it's not ECC. People seem to recommend ECC ram particularly for ZFS and more like 64gb. I'm already using 70% of ram with the VMs I'm running.
- SSD wear with ZFS seems to be a thing and high end drives are recommended.

All I wanted was to have a VM on standby on the backup node in case the primary goes away. I don't need replication, or instant failover.
Thank you
 
I can't have it live because it needs to have the same MAC and IP address as the original.
For that I run a OPNsense VM in failover mode. That way to VMs run in master-backup-mode, share the same IP via CARP and with plugins it can provide redundant DNS, proxy etc. Also got the benefit that it will failover within a second so connections won't drop in case a node goes down. Doesn`t even require a PVE cluster as the failover is done on the software level.

I thought this would be more straight forward. I'm thinking I'll have a look at keepalived.
Thats also an option. I use that for failover of my Pihole DNS: https://forum.proxmox.com/threads/pi-hole-lxc-with-gravity-sync.109881/post-645646
 
I can't have it live because it needs to have the same MAC and IP address as the original.
When you migrate a VM from one node (the primary) to another node (the secondary) the VM's mac and IP address shouldn't change.

The ZFS replication thing that was recommended by @santiagobiali is because that's a simple + effective way to ship changes that have happened on a virtual disk to it's backup/failover disk on another node. It's set up as an automatic time based job (ie "every 15 minutes") kind of thing, with the time being configurable down to 1 minute intervals as the most frequent.

For your two physical nodes, what's their hardware configuration (and model number if they're brand name things like say Dell R730 servers, etc)?

Also, do they have free PCIe slots that could potentially have stuff added to them (if cheap)?
 
Last edited:
  • Like
Reactions: santiagobiali
When you migrate a VM from one node (the primary) to another node (the secondary) the VM's mac and IP address shouldn't change.

The ZFS replication thing that was recommended by @santiagobiali is because that's a simple + effective way to ship changes that have happened on a virtual disk to it's backup/failover disk on another node. It's set up as an automatic time based job (ie "every 15 minutes") kind of thing, with the time being configurable down to 1 minute intervals as the most frequent.

For your two physical nodes, what's their hardware configuration (and model number if they're brand name thing like say Dell R730 servers, etc)?

Also, do they have free PCIe slots that could potentially have stuff added to them (if cheap)?
Yes, migration works great if I want to do maintenance on one of the nodes.

I don't have anything against ZFS, I know it's a great solution, however I realise it does have hardware requirements for it to be effective.

My hardware is:
- HP z240 6th gen i7 Intel CPU 32gb RAM 2tb + 2tb disks
- HP800 Elitedisk Mini G6, 10th generation i5 Intel CPU, 32gb 2tb + 2tb(USB) disks

Cheers
 
k, that looks like a tower pc, with PCIe slots.

HP800 Elitedisk Mini G6
Aaarrgh. Looks like one of those micro sized PC's. Great for a small form factor homelab (and low power draw), but no PCIe slot. Damn. :oops:

If all the boxes had a free PCIe slot, you could (cheaply) add some high speed networking which could have helped. And maybe even some low end SAS SSDs (super cheap on Ebay), which would fix the endurance problem.
 
Last edited:
k, that looks like a tower pc, with PCIe slots.


Aaarrgh. Looks like one of those micro sized PC's. Great for a small form factor homelab (and low power draw), but no PCIe slot. Damn. :oops:

If all the boxes had a free PCIe slot, you could (cheaply) add some high speed networking which could have helped. And maybe even some low end SAS SSDs (super cheap on Ebay), which would fix the endurance problem.
Even though there is no PCI port strictly speaking, it has 2 M2 PCI ports and an option port that can take a 2.5Gb NIC. And yes, I love the low power consumption :)
 
Cool. :)

The way you're talking about ZFS is a bit weird though. If you're up for doing some quick reading, the first comment here (it's under "Promoted Comments") might help clear thing up?

https://arstechnica.com/information...01-understanding-zfs-storage-and-performance/ <-- ignore the article itself (which is a good article btw), it's just that first comment by Jim Salter I'm trying to point to
I'm thinking I might get 2 x 2.5gb NICs for the z240 tower and use the 2.5gb on the mini for management and live with 1gb for the VMs themselves on the mini. That way I'll have ZFS replicating over a 2.5gb network instead of 1gb.

If that works ok, I might think about getting one of those M2 NICs for the second M2 slot if Proxmox supports that.

That would definitely push the limits on that little 800 G6.
 
Thank you everyone for your feedback.
I decided I will sacrifice low power consumption for a better set up, so I will replace the HP 800 mini for a SFF and will run a 25gb point to point network between the two servers. It should be around 30W extra but will give me a lot more options too, including more space for storage.

Cheers
 
  • Like
Reactions: justinclift

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!