Proxmox CEPH Stretch Cluster Network requirements

sureshsb

New Member
Oct 22, 2024
3
0
1
Hello
We plan to deploy CEPH stretch cluster between 2-sites
Site-A : HP DL380 Server with 6*3.84TB SSD (total 5-servers)
Site-B : HP DL380 Server with 6*3.84TB SSD (total 5-servers)
Witness Site: Witness VM will be installed on a Proxmox hypervisor
The distance between site-A and Site-B is 150KM, Site-A and Site-B is connected via 10G fiber link, which is shared by many other systems and applications.
Distance between Site-A and witness site is 50KM connected via 10G shared Fiber
Distance between Site-B and witness site is 100KM connected via 10G shared Fiber

What is the effective bandwidth and latency requirements between Site-A, Site-B and Witness site for CEPH stretched cluster to run reliably without problem.
(The deployment is planned for a critical production environment)
 
Both sites are included in the day-to-day. Ergo, the latency and bandwidth between both sites need to be fast and low. A write is only ACKed once you have 2 replicas written in each site.

A single Ceph cluster spanning multiple sites is therefore only useful if they are closely together. Think different fire sections or buildings on the same campus. Otherwise, the disk IO for the VMs will feel very sluggish.

For disaster recovery (DR), you might want to look into alternative strategies. For example, use the Proxmox Backup Server to sync backups to the DR site, or between both sites. In a disaster you can then (live) restore or do the restores automatically on a schedule.

Another approach can be (async) RBD mirroring between two Ceph clusters. This way, the individual cluster will stay local and fast. https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring
 
  • Like
Reactions: sureshsb
Both sites are included in the day-to-day. Ergo, the latency and bandwidth between both sites need to be fast and low. A write is only ACKed once you have 2 replicas written in each site.

A single Ceph cluster spanning multiple sites is therefore only useful if they are closely together. Think different fire sections or buildings on the same campus. Otherwise, the disk IO for the VMs will feel very sluggish.

For disaster recovery (DR), you might want to look into alternative strategies. For example, use the Proxmox Backup Server to sync backups to the DR site, or between both sites. In a disaster you can then (live) restore or do the restores automatically on a schedule.

Another approach can be (async) RBD mirroring between two Ceph clusters. This way, the individual cluster will stay local and fast. https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring
Thanks for the quick response
 
Both sites are included in the day-to-day. Ergo, the latency and bandwidth between both sites need to be fast and low. A write is only ACKed once you have 2 replicas written in each site.

A single Ceph cluster spanning multiple sites is therefore only useful if they are closely together. Think different fire sections or buildings on the same campus. Otherwise, the disk IO for the VMs will feel very sluggish.

For disaster recovery (DR), you might want to look into alternative strategies. For example, use the Proxmox Backup Server to sync backups to the DR site, or between both sites. In a disaster you can then (live) restore or do the restores automatically on a schedule.

Another approach can be (async) RBD mirroring between two Ceph clusters. This way, the individual cluster will stay local and fast. https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring
Between Site-A and Site-B if 10G bandwidth and 5ms round trip time (RTT) available, is it recommended to use CEPH stretched cluster for the provided scenario
 
A 10Gbit connection can already be a bottleneck in a Ceph cluster if you have somewhat fast disks. See our last benchmark whitepaper.

If 5ms RTT is fast enough is something I cannot say as that really depends on your expectations. Keep in mind, that whenever a client, in our case a VM, writes data, there are usually multiple network trips. See the Diagram at the end of this section in the Ceph docs.
The VM will write data and the primary OSD could already be on a different node, maybe even on the other site. Then the primary OSD sends the data to the other OSDs that store the replicas, one will be in the current site, then another 2 in the remote site. We need 2 replicas in each site.
As you can see, these latencies can add up...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!