need advise: best practice with vms backed by ceph-rbd

Jan 16, 2024
20
2
3
we have a 9 node pve cluster with a ceph cluster on top; we have 18 3.84TB nvme osds and about 60 800GB enterprise sata ssd osds, seperated by crush-rules. networking is done with 10G lacp.
for simplicity and migration speed we wanted to move all our vms from the local storage (ceph was added quite recently) to the nvme-rbd-pool; now there are some doubts on performance. we have some vms which are quite i/o-heavy, which raises the question if we can sooner or later "saturate" the rbd-storage and create a bottleneck there.
in the future we want to run a kubernetes cluster backed by rook aswell, which then itself should host one or more clustered database applications.

what is the best practice for vms on that storage? if there is a limit, how many? do we need to take the actual vm-workload into consideration? do i need several pools? is the network the bottleneck?
i personally dont like the idea of mixing rbd and local storage, but i am not sure if this is the way to go here.
 
Network bottleneck with ceph, go for 100Gbit for ceph internel and separate 100Gbit for ceph clients = pve nodes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!