Cluster between DCs far away?

  • Thread starter Thread starter Deleted member 93625
  • Start date Start date
D

Deleted member 93625

Guest
Hi all,

I'd like to ask a question regarding the scenario I made. The below is the connection diagram.

Diagram.png

The physical distance between data centres is about 1000 km. In this environment,
  1. All PVE nodes are in one cluster.
  2. VMs will run across both data centres.
  3. All VMs' disk images will be stored into a single shared storage (storage 1 in DC A).
  4. Storage will be replicated to other one (storage 2 in DC B).
My plan is using a storage network (Red - 10G single line) for VM migration and storage replication. All other traffics, e.g. private network between VMs, cluster traffic between PVE nodes are using a different network (Green - 100M currently). Those lines are dedicated leased lines.

Assuming the routing for WAN is properly done,
  1. Will my model work okay? Someone said it won't due to the geographical reason - the distance is too far and it will have a huge latency. With ping test, the RTT is around 11, 12 ms between the DCs at the moment.
  2. I am thinking of building a ZFS system for storage (ZFS on Linux) and using ZFS over iSCSI. Will it be okay? I think the cluster doesn't need to know about Storage 2, is this correct?
  3. Suppose that ZFS is used for storage, is it possible to do incremental replication from DC A to DC B? It may not need to be continuous but say every 15 mins? Will it saturate the link easily? We are going to run quite huge amount of VMs (say, over 200).
  4. Let's say, the DC A got a bomb. If I remember correctly, as long as I backed up all VMs configuration and replicated storage's LUN is the same, I can hook the Storage 2 with the same LUN into the cluster and run VMs on DC B. Is this correct?
Hope I explained this well. Thanks very much.

Eoin
 
Will my model work okay? Someone said it won't due to the geographical reason - the distance is too far and it will have a huge latency. With ping test, the RTT is around 11, 12 ms between the DCs at the moment.

You're really at the limit of latency which could work.
While we recommend LAN like latencies with <= 2 ms, we know that a stable network (no latency spikes) can also run with a bit higher latencies. Up to 8 can be OK, as long as there are really no spikes. I know of some people which run it at 10-12 ms, so in your range, and say it works for them, but often that are then just two or three nodes clusters.
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_cluster_network_requirements

That said, if you want to evaluate it I'd recommended the following:
  • use multiple corosync knet links, with the one which has the most stable latencies as "primary" one. That normally means that any link which hosts IO/storage traffic is not the right one, as there will be latency spikes for sure.
  • You run an even node count, if the network between the two DC is overloaded both sides will be unquorate. So, I'd add an external QDevice on the outside, this acts as vote arbitrator and can help if the link between DCs is overloaded or dead
    It's much simpler and has less requirements for network, see: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support
  • You may go further by fine tuning some corosync parameters, check the corosync.conf manpage and talk with the developers and us in the kronosnet #IRC or the cluster-lab mailinglist
I am thinking of building a ZFS system for storage (ZFS on Linux) and using ZFS over iSCSI. Will it be okay? I think the cluster doesn't need to know about Storage 2, is this correct?
Can be OK, depends on your uses case. And no, they do not need to directly access the ZFS on the other nodes.

Suppose that ZFS is used for storage, is it possible to do incremental replication from DC A to DC B? It may not need to be continuous but say every 15 mins? Will it saturate the link easily? We are going to run quite huge amount of VMs (say, over 200).
It's possible, initial replication may take a bit, the following ones are incremental and should not be an issue.
Note, it could be more interesting for you to run two ceph clusters, one on each side, with the other one setup as replication target. This way you have a single unified shared storage per DC, which makes things easier there and still can do replication to the other DC. Plus, Ceph is a bit easier to expand and scale in all directions.

Another option for your setup could be our deduplicated, incremental and fast Proxmox Backup Server, you can do cheap remote syncs there. https://pbs.proxmox.com/docs/introduction.html#main-features

It's still in beta, but we're working hard to get it released as stable and got already quite some positive feedback.

Let's say, the DC A got a bomb. If I remember correctly, as long as I backed up all VMs configuration and replicated storage's LUN is the same, I can hook the Storage 2 with the same LUN into the cluster and run VMs on DC B. Is this correct?
Yes, you should be able to do that. But should also definitively test it before any explosion, to know the steps required and have a somewhat routine or document which helps in a, normally very stressful, event of a DC take down incident.
 
Last edited:
  • Like
Reactions: Stoiko Ivanov
@t.lamprecht Thanks for your detailed response.

It sounds like the latency is really critical. I am not familiar with Ceph storage yet so I may have to do some study for that.

So, if I am really concerned with latency, probably it's better to separate them into two clusters by data centre and make storage replication with Ceph somehow? Is this what you were saying?

Thanks again.

Eoin
 
So, if I am really concerned with latency, probably it's better to separate them into two clusters by data centre and make storage replication with Ceph somehow? Is this what you were saying?

Would lessen the coupling and make things a bit more stable on each side for sure, IMO.

The main downside is that you lose the unified management view, and so also live migration between two DCs. For the rest I'd say that having two browser tabs open isn't so bad compared to fine-tuning latency sensitive network links :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!