[SOLVED] Cluster Planning: Stability if Half the Cluster Servers (Planned) Fail

Asano · Sep 6, 2024

I'm currently planning a new Proxmox installation for an organization with two sites in different countries. The sites are connected with a private backbone and thus for the means of disaster recovery I want to be able to fail over all services to one site and be able to bring them up with the other site remaining offline. If in such a disaster there is data loss of few minutes and the fail over will take some time (even a few days) and manual labor that is no problem. So true HA and thus HA storage is not required and zfs with pve-zsync or storage replication would be sufficient (though maybe we still will opt for ceph but that is irrelevant for this topic).

The question now is, is it better to have A) one Proxmox cluster with an equal amount of servers on each site or B) have two entirely separate Proxmox clusters on each site? And is there best practice/recommended approach?

Personally I'd like one cluster more since it is less maintenance and documentation. However I'm not sure how good a Proxmox cluster would function without quorum in the event of disaster and what pitfalls there might be. With no ture HA there also is no fencing which should make it easier but for example I remember in the earlier implementations of Proxmox 2FA you couldn't log in anymore into an Proxmox server which was part of cluster which lost quorum. This would be very relevant for this installation as well as SSH will be blocked. So is this still an issue and are there other known similar issues?

Thanks for any insights!

leesteken · Sep 6, 2024

Separate clusters because they cannot deal with high latency between nodes (and you probably depend on other parties for the connection between countries).
Why not run PBS on both sides (which can backup running VMs locally very quickly) and let them sync with each other. Then you can easily restore a relatively recent version of each VM on the other side if you need to. Or switch each weekend between sites, to make sure everything still works and DR is not something rare and hardly tested?

EDIT: I have no experience with this, so feel free to tell me why I'm wrong about this.

esi_y · Sep 6, 2024

Asano said:
However I'm not sure how good a Proxmox cluster would function without quorum in the event of disaster and what pitfalls there might be. With no ture HA there also is no fencing which should make it easier

It's literally designed NOT TO WORK in that scenario:

https://forum.proxmox.com/threads/high-latency-clusters.141098/

bbgeek17 · Sep 6, 2024

Hi @Asano, sounds like you have good foundation to start working from.

Asano said:
The question now is, is it better to have A) one Proxmox cluster with an equal amount of servers on each site or B) have two entirely separate Proxmox clusters on each site? And is there best practice/recommended approach?

If your choice of replication is ZFS, then the nodes have to be in the single cluster. There is no cross-cluster replication, yet. There is remote-migrate (beta), but its not quiet what you want here.

If you split your nodes into equal parts, then you guarantee a split-brain situation down the road. The "trick" with cross-site clusters is to have the "vote" in the 3rd location. In that case only the double-failure will cause an outage (a site plus the link to the vote). Planning to survive a double-failure is a difficult task.

As @leesteken said, you should go with two isolated clusters. That removes inter-dependency between sites. Since you dont have tight RTO or RPO - the PBS backup/replication seems to be the best approach.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

esi_y · Sep 6, 2024

bbgeek17 said:
There is no cross-cluster replication, yet. There is remote-migrate (beta), but its not quiet what you want here.

And what's wrong with own zfs send | receive?

Asano · Sep 6, 2024

esi_y said:
It's literally designed NOT TO WORK in that scenario:

https://forum.proxmox.com/threads/high-latency-clusters.141098/

Thanks for the link. I did know it was not designed for it but what I didn't know was the actual pitfalls (which I asked for) and the one @fabian mentioned in that thread, that the "amount of time [for /etc/pve syncs] is hard-coded everywhere" (ufff

) alone is severe enough to not think any further of a cluster spanning over two sites.

As said storage is not the topic here but regardless the choice of ceph, zfs or PBS there are good tools and strategies for near real time cluster to cluster sync (though PBS surely would be the worst choice for that use case). So that is really no issue.

esi_y · Sep 6, 2024

Asano said:
Thanks for the link. I did know it was not designed for it but what I didn't know was the actual pitfalls (which I asked for) and the one @fabian mentioned in that thread, that the "amount of time [for /etc/pve syncs] is hard-coded everywhere" (ufff ) alone is severe enough to not think any further of a cluster spanning over two sites.

Now that you namedropped him, I just want to add a disclaimer: I have no problem with the design, just when I asked that question myself, I was confused by the "not designed to" as opposed to "designed not to" (makes a difference to me). And the other thing is, I like to be precise in what is the element that relies on the low latency, it is not corosync per se, it is pmxcfs and the choices made there. Now I have no opinion what other choices might have brought because I also undestand that having extended virtual synchrony between nodes that have IO blocked for 30 seconds is ... not workable.

esi_y · Sep 6, 2024

esi_y said:
And what's wrong with own zfs send | receive?

Just want to add here - if you were doing this - to make it workable, it needs to be something like:

zfs send pool/dataset@snapshot | mbuffer -s 128k -m 512M | ssh remote_node zfs receive pool/dataset

Search

Search

[SOLVED] Cluster Planning: Stability if Half the Cluster Servers (Planned) Fail

Asano

Well-Known Member

leesteken

Distinguished Member

esi_y

Active Member

bbgeek17

Distinguished Member

esi_y

Active Member

Asano

Well-Known Member

esi_y

Active Member

esi_y

Active Member