need advise: best practice with vms backed by ceph-rbd

Jan 16, 2024
20
2
3
we have a 9 node pve cluster with a ceph cluster on top; we have 18 3.84TB nvme osds and about 60 800GB enterprise sata ssd osds, seperated by crush-rules. networking is done with 10G lacp.
for simplicity and migration speed we wanted to move all our vms from the local storage (ceph was added quite recently) to the nvme-rbd-pool; now there are some doubts on performance. we have some vms which are quite i/o-heavy, which raises the question if we can sooner or later "saturate" the rbd-storage and create a bottleneck there.
in the future we want to run a kubernetes cluster backed by rook aswell, which then itself should host one or more clustered database applications.

what is the best practice for vms on that storage? if there is a limit, how many? do we need to take the actual vm-workload into consideration? do i need several pools? is the network the bottleneck?
i personally dont like the idea of mixing rbd and local storage, but i am not sure if this is the way to go here.
 
Network bottleneck with ceph, go for 100Gbit for ceph internel and separate 100Gbit for ceph clients = pve nodes.