Ceph disk layout

sc3705

Member
Jul 3, 2020
27
1
8
41
Hi All,

I'm having a hard time making a decision on storage for my new proxmox cluster, hoping for some advise.

I have 3 nodes with a mismatch assortment of disks. This hardware was not intended to be clustered when purchased, this is a new direction. All nodes are Poweredge servers with separate 10G fiber connections for the cluster and server traffic.

node 1:
X5 2TB sas drives 2.5"
x1 2TB NVME

Node 2:
x2 2TB sata 7200K 3.5"
X2 3TB sata 7200K 3.5"
x1 2TB NVME

Node 3:
X6 3TB sata 7200K 3.5"
x1 500GB NVME

All 3 have separate boot disks not listed here.

My original thought was 2 pools, high performance and "regular". The NVMEs for high performance and all the rest in a giant regular pool. However, then I started thinking with that many spindles could the regular cluster end up being faster than the high performance? And i'm also thinking the 500GB nmve is going to limit the capacity of the nmve pool. So, if i'm correct then the NMVe pool doesn't even need to exist.

Then I thought to use the NVMe drives for cache, but i just don't know if I have enough data and traffic to justify over 4TB of NMVe cache. I could use one of those drives else where; Thinking laptop..... :cool:

I'd rather not do this because of cost (but I will if i absolutely must); the thought came up to remove one of the sas drives from node 1, and add x4 2TB sas drives to each of the other two nodes; along with another 2tb NVMe drive, just so they all match. Not an expense i'll be happy about so I'm really trying to avoid this. This is a last resort if you folks tell me i'm insane for even thinking about mixing these disks.

With the best blend on performance to safety at the forefront, what's the best way to configure these disks?

Thanks in advance!!


EDIT: I referenced it a couple times so here's one update; "x1 500GB NVME" is actually x1 1TB NVME. I forgot about an upgrade a year ago. This also means I have a 500GB NVMe drive that can be added if needed.
 
In generell, for Ceph the evenly distributed the better. And depending on your hardware, don't expect to much performance out of it.
 
In generell, for Ceph the evenly distributed the better.

Last night I made a decision but I haven't had time to fully implement. Can you vet this idea?

  • Move 1 disk from Node 3 to Node 2. So, I'll have 5 disks on each node.
  • Use the 2TB NVMe drives as the DB disk for all drives. From what I read the journal will automatically be placed on the NMVe as it's the fastest disk.
My only real concern, I know I'm leaving something on the table here but I don't know what it is. I'm new to Ceph/distributed file systems. If Ceph work like typical raid, in this configuration I'm forfeiting a ton of disk space, about 8TB give or take. Also, the SAS drives likely won't achieve full speed.

Is the same generalization true with Ceph? Being a distributed system, I wouldn't think the different disk speeds would slowdown the overall pool. I just don't know enough about Ceph to commit to that statement. I'm thinking I can upgrade to sas drives and maybe to 3TB drives over time. My assumption is this would be a long but easy process to do with Ceph.


And depending on your hardware, don't expect to much performance out of it.

Can you elaborate on that? I understand that "Performance" is a highly subjective term but is there a high level guide line? For example, "ZFS requires tons of ram(1gb for every TB)", or "only use ECC memory". There's a lot to pick apart in those two ZFS statements, but they'll point you in the general direction.

My sas drives are 12G enterprise disks and the sata disks are NAS rated Seagate drives. The NVMe drives are consumer grade but they're all on the high end; 1 crucial and the rest are Intel. Node 1 has 192GB ram 12c/24t, node 2 and 3 both have 32 GB ram 4c/8t. Node 3 will be upgraded to the exact same as node 1 by January and Node 2 will likely follow around this time next year. Node 1 is a PowerEdge T440 and node 2&3 are T310's. Is this enough for a general idea? At least a don't even bother or a maybe? I'll be running maybe 20 - 25 VMs.

node 1:
X5 2TB sas drives 2.5"
x1 2TB NVME

Node 2:
x2 2TB sata 7200K 3.5"
X2 3TB sata 7200K 3.5"
x1 2TB NVME

Node 3:
X6 3TB sata 7200K 3.5"
x1 1TB NVME
 
Being a distributed system, I wouldn't think the different disk speeds would slowdown the overall pool.
With slowest, you mean high latency? If so, then yes, the drive with the highest latency will slowdown the clusters. But since the distribution is higher then with a RAID controller, the effect might be not so visible.

Can you elaborate on that?
See our benchmark paper and the forum thread.
https://proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!