Redundancy best practices for multi-site failover

pmt_cnq

New Member
Aug 4, 2025
14
1
3
We are in the process of migrating from VMware to ProxMox. We have a Primary site and a Secondary site in case the 1st one fail.
To ensure proper design, we are questioning how we should setup the storage of the SANs (iSCSI) for proper failover redundancy. We did a lot of reading, but no definitive answer so far regarding ZFS and CEPH. For reference: https://pve.proxmox.com/pve-docs/pve-admin-guide.html?pubDate=20250308#chapter_storage

Here's a quick rundown of our infrastructure:
Primary site
Promox host 1
Storage SAN1 (connected via ISCSI)
About 60 virtual machines running on Proxmox 1.
Quorum server3 (for votes)

Secondary site (standby)
Proxmox host 2
Storage SAN2 (connected via ISCSI)

The sites are linked via Layer-2 network (10Gbps).

We have a lab in place to simulate the whole new Proxmox concept. HA is configured and working, but with some hiccups to what we want to achieve. Everything is going pretty well, but when we disconnect the Server1 to simulate an outage, the VM failover can't find its boot disk on the secondary server.

So the question I have is: how should we setup the 2 storage SANs to ensure redundancy when 1 fails?
Some suggestions we found so far:
1- Setup a single ZFS pool with the same name on both servers. Should I assume that the data will be safely distributed between the 2 SANs, and if 1 SAN fails, everything on the 2nd one will still be intact? Is it that simple?
2- Use CEPH instead of ZFS. But that seems to add a layer of complexity (or not?).
3- Configure 2 storage pool on each servers, targetting each SAN.
4- Opposite of point 3 > Configure each SAN to provide 1 storage pool per server.

In VMware, we have a replication every # of hours. If the Primary site fails, we simply boot the latest VM replication on the secondary host and we are good to go. Our lack of knowledge in ProxMox seems to block us from doing that so far. We don't mind losing a few hours of data in case of a crash (Primary site is pretty well protected, but we never know). Otherwise, we love the platform! :D

Thanks in advance for your help.
 
Hi @pmt_cnq,

The first thing to evaluate is your cluster and node distribution. For a Primary/Standby site setup, you’ll need to introduce a third site. With your current 3-node cluster majority located in Site1, failure of Site1 means Site2 won't have quorum and will not be able to bring up services.

If you achieve proper quorum distribution, across three sites, you may be able to use Ceph. However, latency becomes a critical factor. Both PVE and Ceph are very sensitive to site-to-site (and even intra-site) latency. For a stable and responsive environment, both sites essentially need to behave like a LAN. And again, the third site remains essential for maintaining quorum during a failure scenario.

As for ZFS, it’s a local filesystem. You can use it for asynchronous replication. Within PVE, replication only works inside a single cluster. So you’re still bound by the same low-latency and quorum requirements. Additionally, ZFS replication introduces a delay (usually a few minutes), depending on your data change rate.

If you're handling replication at the storage level (outside of PVE), whether sync or async, then you’ll need a custom DR procedure to activate the standby site. There’s no out-of-the-box solution for this. Besides the VM data itself, you’ll also need to consider how to handle VM configurations, networking, and other critical infrastructure elements.

Cheers!


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: LnxBil and UdoB
So far we made good progress. Note that we have a Quorum device (server) in Site 1 and we tested the HA prior to this. So the quorum should be fine.
We will share more details here when we achieve a final tested solution, but so far it's looking good.

Key takeaway are:
0- Use ZFS over ISCSi.
1- Never present the same LUN to both sites active.
2- No shared-write. Only one site exports a given LUN at a time. (We were trying to sync both directly, without success).
3- Use QEMU Agent for application-consistent snapshots.
4- Failover scope > If one site fails, the other SAN will present the replicated LUNs, Proxmox rescans iSCSI, and VMs will start on the surviving node.

Basically the architecture needs to be seen in "2 silos" as follow:
Host1 sees the ISCSi SAN1
Host2 sees the ISCSi SAN2
> Each site runs some VMs off its *own* SAN’s iSCSI LUNs.
> Replication setup in cross-site (SAN1 to SAN2 and vice-versa).

Process:
1- Create a ZPOOL. Let's say "san_pool". (You need point the local SAN for each host).
2- Configure iSCSi with multipath.
3- Add the storage to the datacenter.
4- Setup ZFS replication between SANs.
5- Test VM migration.
6- Configure node HA and test a node failure.

* Will post all the details later when the POC is finalized and tested.
 
Last edited: