Ceph configuration recommendations?

techd00d

Member
Aug 10, 2017
3
0
21
41
Hello all, setting up a new 5 node cluster with the following identical specs for each node. Been using proxmox for many years but am new to ceph. I spun up a test environment and it has been working perfectly for a couple months. Now looking to make sure we are moving the right direction with the overall setup and configuration and I figured I'd gather some information from those much more familiar with it than myself.

I noted how we configured them in our testing but that was based on research and not real-world experience as I am sure many of you have.

AMD EPYC 7351
64GB DDR4
250GB SSD (OS Drive)
1.2TB NVMe (DB/WAL)
3.2TB NVMe (This is an addition, did not have this drive in testing)
3 x 4TB 7200rpm HDD (Primary/only storage pool)
2 x 1Gbps Port (Public WAN)
2 x 10Gbps Port (2 x private lan, 1 for Management/private, one dedicated for Ceph)

I think the two biggest questions are:

1. Should I have 2 pools, one with pure nvme, and a second with HDD and the smaller NVME as the DB/WAL? Or just add the larger nvme to the HDD pool and let it tier the data?

2. Should I bond the 10GB ports and share them, or leave them separate as they are?
 
1. Should I have 2 pools, one with pure nvme, and a second with HDD and the smaller NVME as the DB/WAL? Or just add the larger nvme to the HDD pool and let it tier the data?
Ceph will not tier data automatically depending on the disk type. You can create rules that will limit to the device class and assign them to pools. So you could have an HDD pool and an NVME pool. If you leave it at the default rule, the data will be spread out without regard of the device class.


2. Should I bond the 10GB ports and share them, or leave them separate as they are?
Depends on your needs. If you do want your cluster to be as available as possible and able to handle failures, you should set up everything with redundancy: bonded networks to different (stacked) switches, OS disks in a mirrored RAID, ....


Other things that I noticed: Memory: Each Ceph service (MON, MGR, OSD, MDS) does need a bit of memory and depending on how much memory you need for your guests, 64GB of RAM could be a little low.

Network: How many nodes will you have in the cluster? A 10Gbit network can very easily be saturated by Ceph if you a bit of load on it. See the older Ceph Benchmark paper from 2018 where we tested different network speeds. Ideally you have a dedicated physical network for Ceph so it does not interfere with other traffic and does not get interfered by other traffic.

The same goes for the Proxmox VE cluster communictaion (Corosync). Ideally, you have at least one phyiscal network just for it. It does not need to be fast, 1Gbit is enough, but it needs to reliably have a low latency. Therefore you should not have it on the same network with other services that might take up all the bandwidth which in turn will increase the latency for the Corosync packets. You can add multiple links to the Corosync config for better redundancy. It will switch itself should the main link become unavailable (due to failure or high latency)

The more you can separate services on their own dedicated networks, the better :)
 
  • Like
Reactions: herzkerl

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!