Ceph Cluster Hardware Configuration

ejmerkel

Well-Known Member
Sep 20, 2012
117
4
58
I am just curious if the following looks like a decent configuration for a Proxmox VE / Ceph cluster.

3 Dell 730xd servers with the following specs:

CPU: 2 x Xeon e5-2670 v3 2/3GHz
RAM: 256G RDIMM, 2133MT/s, Dual Rank, x4 Data Width
DISK CONTROLLER: PERC H730 RAID (not using for RAID of course - will make all OSD disks RAID0)
Network: Intel Ethernet X540 DP 10Gb + I350 1Gb DP Network Daughter Card
OS Partition: 2x1TB 7.2K SATA - RAID1 (going to use part of this disk for local lvm & ISO/templates)
Journaling Partition: 2x200GB SSD SATA - RAID1 Mix Use MLC 6Gbps
OSDs: 6x4TB 7.2K SATA

I realize the OS partition is not an SSD drive as recommended here http://pve.proxmox.com/wiki/Ceph_Server. Is that going to cause any performance issues?

If I use triple replication I should net 19.2T (80% of 24TB) correct?

My hope is after some initial Ceph testing is to make this system a production cluster. I will have the capacity to add 4 more 3.5" SATA drives per server.

I appreciate any comments or suggestions.

Best regards,
Eric
 
Hi,

>>I realize the OS partition is not an SSD drive as recommended here
http://pve.proxmox.com/wiki/Ceph_Server. Is that going to cause any performance issues?
No problem here. you'll have the ceph monitor on the root partition, but with perc cache it should be ok.


>>If I use triple replication I should net 19.2T (80% of 24TB) correct?

No, you'll have 8TB. (3 copies of each datas).

If you want something like raid-5|6 with disk parity, ceph support since firefly a new replication called "erasure code", but it's not implement in proxmox gui.
 
Hi,

>>I realize the OS partition is not an SSD drive as recommended here
http://pve.proxmox.com/wiki/Ceph_Server. Is that going to cause any performance issues?
No problem here. you'll have the ceph monitor on the root partition, but with perc cache it should be ok.


>>If I use triple replication I should net 19.2T (80% of 24TB) correct?

No, you'll have 8TB. (3 copies of each datas).

If you want something like raid-5|6 with disk parity, ceph support since firefly a new replication called "erasure code", but it's not implement in proxmox gui.

Just to clarify each server has 24TB so between the 3 servers I will have 72TB. So just to make sure I am not misunderstanding, I should end up with 72TB / 3 * .80 = 19.2TB?

Best regards,
Eric
 
I read somewhere that you should not use more than 80% of your total disk capacity. I assume this is in case you have failed drives etc?

Eric

They are a protection,

mon osd full ratio = 0.95 (95%).

So when you reach 95% , it'll change as read only.

You can tune it if you want.



In case of a failed drive, ceph will try to replicate missing blocks on other osds on same node.
 
From what I have read you never want to use more than 80% of your total disk capacity. Not really sure why?

Eric
Hi,
ths is right - normaly you should use much less. There are some reasons:

1. If one node fails, the content must have place on the remaining nodes before mon_osd_full_ratio came into play.

2. the performace drop with a full disks - esp. with xfs-disks. It's helpfull to use mountoptions to avoid fragmentation:
osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
I switched my cluster from xfs to ext4 (move an archiv to an external raid before) - now I move the content back to reach again the +60% fill. See than if the performance is better than with xfs, but it's seems so.

3. some versions ago, the best thing to stop ceph working is an full OSD!! Perhaps it's better now, but I would avoid this in an production environment.

4. The disks are not filled in an equal distribution. There are sometimes huge differences (depends on your workload - VM-disks which are only partly filled). If this happens during rebuilding of one failed node, you can easily reach mon_osd_full_ratio on single disks and on other disks have plenty free space...

If your cluster 70% filled you should have ordered another OSD-Node.

Udo
 
If I use triple replication I should net 19.2T (80% of 24TB) correct?
Hi,
if you format an 4TB-drive with xfs, you will get an ceph weight of 3.64 by disk and 3.58 for ext4 (depends on your mkfs-options).

E.G. you will get 21.84 TB (or 21.48 for ext4) for each node.

If one node can fail, the other nodes needs the space for the rebuild: around 60% would the max usable space if your mon_osd_full_ratio is high (like 0.95).
If you are able to bring the failed node back fast, perhaps you can go higher?!


Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!