How many nodes in a single cluster with Ceph requirement?

cocobanana

Active Member
Jun 18, 2021
32
3
28
38
Hi,

I have been using proxmox quite sometimes and the performance was very impressive. I decided to provide 500 VMs to 1000VMs with cluster with ceph( Only linux and windows KVM )

I planned to use Dell Poweredge R630 and NVMe u.2 disk.

I would appreciate if anyone of us can share the best requirement for this in terms of how many nodes required, how many disks , the setup e.t.c

Thanks!
 
its an innocent enough question- but there are a lot of gotchas you need to consider.

Clusters are made up of 3 elements- compute, storage, and networking. lets touch on each.

COMPUTE:
- Dell R630 is a 10 year old platform. as such, it offers pretty poor performance/watt. Do you already know where you are deploying this solution? how much power and cooling are provided?
- Ignore the VM count for a moment. How much cpu load will the TOTAL cluster workload be in terms of core-ghz? you need to account for typical and max, and excess capacity for failover.
- Add a core and 4G of ram per every OSD since it appears you intend to use this as an HCI.
- once you add it all up, you'll have an idea of how many servers you will be deploying.

STORAGE:
- What is the minimum required usable capacity? you should be prepared for 4x RAW. a smallish number of high capacity OSDs could work, but you gain better performance with higher OSD count.
- Dell R630s support up to 4 NVMEs but only on the 10 drive models. Since you cant get these new, be aware that most 10 drive models you will find in the wild dont actually have NVME support and you will have to buy and install it separately.

NETWORKING:
- ideally you need separate interfaces for ceph, ceph private, cluster, and service networks. BMC too. Be aware that R630 is a PCIe gen3 platform which means your maximum practical link speed is 100gbe, and some of your pci lanes will be consumed by your nvmes (16 lanes total) so 4x25g is a good practical configuration for this generation of hardware. you'll want to get your port count based on your node count, and then provision 2 switches that can accommodate half each.

There is a lot more to consider, but this should give you a starting point.