Ceph - 4 Node NVME Cluster Recommendations

yena · Feb 27, 2024

I need to build a 4 node Ceph cluster and I need an effective capacity of 30TB.
Each node is configured like this:

Supermicro 2U Storage Server 24 x NVME, X11DPU, Dual 1600W
2 x Intel Xeon Gold 6240 18 Core 2.6Ghz Processor
768G Ram
2x 100G Network
6 x Samsung PM9A3 3.84TB PCIe 4.0 2.5

Networking:
2 x Switch N8560-48BC, Switch Data Center Ethernet L3 a 48 Porte, 48 x 25Gb SFP28, con 8 x 100Gb QSFP28, Supporto MLAG/Stacking, Chip Broadcom

Using: https://florian.ca/ceph-calculator/
with
Number of Replicas: 3
Numer of Nodes: 4
Total OSD size x node: 22.8
i have:

Total cluster size
88 TB Total raw purchased storage. You will never be able to use this much unless you turn off all replication (which is foolish)Worst failure replication size

22 TB Amount of data that will have to be replicated after "worst failure" occurs. Assume that the worst failure we can have is failure of the biggest node. You decide if this assumption is sufficiently conservative. Risky cluster size

29.33 TB How much of raw storage is available if you are ok with being in degraded state while the failed node is fixed? This is assuming you can fix it at least partially by recovering some OSDs from it. If you are doing this you should probably do "ceph osd set noout" to avoid replication eating up all free space and/or have a very quick disaster recovery plan. If you just let it fix itself, the cluster will run out of space and/or lose data. So this is not a good plan unless you really know what you are doing.Risky efficiency

33% Same as above in percent Safe cluster size

22 TB How much of raw storage is safely available even in worse case? If you use no more than this amount of storage, you can sleep well at night knowing at you do not have to intervene in case of failure. Ceph will magically fix itself (Only this time though. All bets are off for next failure as you will probably be in the "risky" scenario after this first failure is handled)Safe efficiency

25% Same as above in percentSafe nearfull ratio

0.75 Set osd nearfull ratio to this number to get proper warning when safety margin exceeded. (default is 0.85 which may be too high - too risky)

Number of replies: 3
does this mean that 3 OSDs can break at the same time right?
Is this correct or would it be enough to set it to 2?

Do you think this type of configuration is safe enough?
Approximately what I/O performance would it have?

If I compared it with a ZFS Shared Storage solution with a RaidZ configuration of similar redundancy, would it perform much less?

In the event of an OSD failure, how much is performance impacted?

Thanks!

Alwin Antreich · Apr 2, 2024

yena said:
Number of replies: 3
does this mean that 3 OSDs can break at the same time right?
Is this correct or would it be enough to set it to 2?

Not quite. size = 3 means that ceph will strive to always have 3x replicas. While min_size = 2 is the number of copies that need to exist to allow IO. Which means when you have only one copy left, then all the IO is halted for this particular PG. Since Ceph distributes PGs across nodes by default, a loss of two OSDs on different nodes can be accepted without losing data.

yena said:
Do you think this type of configuration is safe enough?

What do you mean by that? Against what failure should it be safe?

yena said:
Approximately what I/O performance would it have?

The benchmark may provide insights to that and how you can test your setup.
https://proxmox.com/en/downloads/pr...cumentation/proxmox-ve-ceph-benchmark-2020-09

yena said:
If I compared it with a ZFS Shared Storage solution with a RaidZ configuration of similar redundancy, would it perform much less?

These are two different storage solutions. They might be somewhere in the same ballpark though.

yena said:
In the event of an OSD failure, how much is performance impacted?

Depends on the IOps needed by your workload. While there is an uptick in latency, Ceph will handle everything transparently and continue operation.

Search

Search

Ceph - 4 Node NVME Cluster Recommendations

yena

Renowned Member

Alwin Antreich

Active Member