Proxmox CEPH Stretch Cluster Network requirements

sureshsb · 2024-10-23T12:27:47+0200

Hello
We plan to deploy CEPH stretch cluster between 2-sites
Site-A : HP DL380 Server with 6*3.84TB SSD (total 5-servers)
Site-B : HP DL380 Server with 6*3.84TB SSD (total 5-servers)
Witness Site: Witness VM will be installed on a Proxmox hypervisor
The distance between site-A and Site-B is 150KM, Site-A and Site-B is connected via 10G fiber link, which is shared by many other systems and applications.
Distance between Site-A and witness site is 50KM connected via 10G shared Fiber
Distance between Site-B and witness site is 100KM connected via 10G shared Fiber

What is the effective bandwidth and latency requirements between Site-A, Site-B and Witness site for CEPH stretched cluster to run reliably without problem.
(The deployment is planned for a critical production environment)

aaron · 2024-10-23T13:07:55+0200

Both sites are included in the day-to-day. Ergo, the latency and bandwidth between both sites need to be fast and low. A write is only ACKed once you have 2 replicas written in each site.

A single Ceph cluster spanning multiple sites is therefore only useful if they are closely together. Think different fire sections or buildings on the same campus. Otherwise, the disk IO for the VMs will feel very sluggish.

For disaster recovery (DR), you might want to look into alternative strategies. For example, use the Proxmox Backup Server to sync backups to the DR site, or between both sites. In a disaster you can then (live) restore or do the restores automatically on a schedule.

Another approach can be (async) RBD mirroring between two Ceph clusters. This way, the individual cluster will stay local and fast. https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring

sureshsb · 2024-10-23T13:19:03+0200

aaron said:
Both sites are included in the day-to-day. Ergo, the latency and bandwidth between both sites need to be fast and low. A write is only ACKed once you have 2 replicas written in each site.

A single Ceph cluster spanning multiple sites is therefore only useful if they are closely together. Think different fire sections or buildings on the same campus. Otherwise, the disk IO for the VMs will feel very sluggish.

For disaster recovery (DR), you might want to look into alternative strategies. For example, use the Proxmox Backup Server to sync backups to the DR site, or between both sites. In a disaster you can then (live) restore or do the restores automatically on a schedule.

Another approach can be (async) RBD mirroring between two Ceph clusters. This way, the individual cluster will stay local and fast. https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring

Thanks for the quick response

sureshsb · 2024-10-23T14:16:56+0200

aaron said:
Both sites are included in the day-to-day. Ergo, the latency and bandwidth between both sites need to be fast and low. A write is only ACKed once you have 2 replicas written in each site.

A single Ceph cluster spanning multiple sites is therefore only useful if they are closely together. Think different fire sections or buildings on the same campus. Otherwise, the disk IO for the VMs will feel very sluggish.

For disaster recovery (DR), you might want to look into alternative strategies. For example, use the Proxmox Backup Server to sync backups to the DR site, or between both sites. In a disaster you can then (live) restore or do the restores automatically on a schedule.

Another approach can be (async) RBD mirroring between two Ceph clusters. This way, the individual cluster will stay local and fast. https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring

Between Site-A and Site-B if 10G bandwidth and 5ms round trip time (RTT) available, is it recommended to use CEPH stretched cluster for the provided scenario

aaron · 2024-10-23T14:50:01+0200

A 10Gbit connection can already be a bottleneck in a Ceph cluster if you have somewhat fast disks. See our last benchmark whitepaper.

If 5ms RTT is fast enough is something I cannot say as that really depends on your expectations. Keep in mind, that whenever a client, in our case a VM, writes data, there are usually multiple network trips. See the Diagram at the end of this section in the Ceph docs.
The VM will write data and the primary OSD could already be on a different node, maybe even on the other site. Then the primary OSD sends the data to the other OSDs that store the replicas, one will be in the current site, then another 2 in the remote site. We need 2 replicas in each site.
As you can see, these latencies can add up...

Search

Search

Proxmox CEPH Stretch Cluster Network requirements

sureshsb

New Member

aaron

Proxmox Staff Member

sureshsb

New Member

sureshsb

New Member

aaron

Proxmox Staff Member