Sanity check for new installation

tux-o-mat

New Member
Apr 23, 2024
7
1
3
Could we get some 2nd and 3rd opinions of a plan for a new datacenter deployment:

8 PVE Hosts, each has two 16 core Xeons and 512GB of reg. Ram. We further have 4 10GbE NICs in each machine, two of those should handle guest traffic, the other two are for storage traffic. Each machine will have 4 Samsung PM1653 8TB SAS SSD drives on a HBA for the use of Ceph.
Storage and guest traffic is being handled by independent, redundant switches. All DAC connections.

Guest profiles will be mixed, from archive machines with very few movements to SQL Appliances.
Backup will be to a baremetal PBS, at least one 16 core Xeon, ZFS on HBA or LVM on Raid. Of course it will be replicated to another site.

The main (only) problem i can see would be the 10GbE being a bottleneck for the Ceph traffic. Any ideas / experiences on that matter?

Any input is much appreciated. Thank you!
 
10GbE will most likely be a bottle-neck for ceph - I can recommend checking out our Ceph Benchmark paper from 2023/2024 - this should give you a good starting point:
https://www.proxmox.com/images/download/pve/docs/Proxmox-VE-Ceph-Benchmark-202312-rev0.pdf

I hope this helps!
Thank you Stoiko for the link! I already read this one. In fact my doubts are coming from that exact paper.
I was hoping for some practical use case where those findings are applied to real world application.
 
Thanks @LnxBil. Indeed i only mentioned the multi-GB interfaces. We have at least two 1GbE interfaces per machine to use for corosync etc.

That means, i could use the existing interfaces without redundancy (1x Ceph data, 1x Ceph public, 1x Guest traffic, 1x Corosync) or i have to get two more "fast" interfaces per machine. If i start buying interfaces again, it will be at least 40GbE, just wondering if i can get by without those 30k$ (switches included).
 
That means, i could use the existing interfaces without redundancy (1x Ceph data, 1x Ceph public, 1x Guest traffic, 1x Corosync) or i have to get two more "fast" interfaces per machine. If i start buying interfaces again, it will be at least 40GbE, just wondering if i can get by without those 30k$ (switches included).
If you want to tolerate a switch problem and operate in a well defined space, sadly yes. Both CEPH can be on one network and you can have corosync also on the public network, yet you may run into congestion problems. Those can be somewhat mitigated with QoS, yet are not recommended.
 
My concerns were not wrong, it seems. I will start planning a fouth network segment for Ceph Data - probably 100GbE because that will most likely be more future proof than making a smaller step towards 25 or 40 gigabits. It's just no good risiking more problems further down the line. So we will have 2x100 Ceph Data, 2x10 Ceph Public, 2x10 Guest Traffic und 2x1 Corosync.

Thank you guys for your input. Much appreciated!