Hosted Proxmox Root Server: Planning migration from DRBD8 to Ceph

Simon_

Renowned Member
Oct 2, 2014
5
1
66
We currently have a 2-Node PVE 3.4 Cluster with DRBD8 on Hetzner Root-Servers and have been very happy with this for several years (over different versions).

With the coming end of life for proxmox 3.4 and the problems relating DRBD9 we are looking into switching to a 3-Node Ceph-Server HA-Cluster (OSDs and VMs on the same hosts, like described in proxmox wiki: https://pve.proxmox.com/wiki/Ceph_Server )

My issue with this is trying to keep the budget from multiplying.

Has anyone tried to use a single 10gb Network for all internal cluster communication, one NIC per node, connected via a dedicated 10gb Switch? The nodes have a separate 1GB NIC for uplink.

This could be separated on the switch into three (or more) tagged VLANs for ceph-private, corosync and other (internal) cluster-communication and QoS / limiters.

Proxmox support told us that a physical separation of ceph and corosync is highly recommended, not only due to bandwith but also latency issues, but that VLANs with QoS / Limiters on the Switch might alleviate the issue.

Does anyone have experience with a similar setup?
 
I'm running 2x10gb bonded by proxmox node, with 1vlan for ceph, 1vlan for proxmox management, x vlans for customers vm.

Are the 2x10gb bonded for failover or bandwidth increase?
If the first (failover) than latency and bandwith should be comparable to what I was planning and - together with Udo's reply - would suggest that this is feasible.

After some more reading on ceph performance (mellanox block) though, I am under the impression that maybe reserving the 10gb for ceph might be necessary after all.
I am planning a 3-Node Cluster with 2 OSDs per Node (DataCenter SSD). The writes from VMs to the journal and replication over the cluster at the same time might use more bandwith than I initially thought.

Anyway since we have decided to go for the Hetzner PX91-Server with dual 1gb NIC onboard,1 extra 10gb NIC and a 10gb switch I can dedicate the 10gb to ceph exclusively and have 1gb for PVE-Cluster and VM communication.
 
>>Are the 2x10gb bonded for failover or bandwidth increase?

mainly for failover.
My ceph nodes only use a lot of bandwidth if a osd fail, and rebuild is needed.
My vm mainly do small 4k random read/write, no more than 1 or 2gbit/s

>>If the first (failover) than latency and bandwith should be comparable to what I was planning and - together with Udo's reply - would >>suggest that this is feasible.
failover is better, you don't have almost 0 impact on failover.
But you'll have latency increase (mainly for low iodepth), because of ceph protocol complexity. (try to have fast frenquency cpu if possible)

for bandwith, I can saturated 2x10GB with big block read easily.