Understanding Capacity Limits with Ceph

Apr 9, 2026
1
0
1
Hi everyone,

we are using Proxmox VE with Ceph and were wondering how much storage we should use from our available storage before ceph starts acting up.
Our Ceph Cluster consists of three nodes that each contribute a capacity of 27,94TB, so we reach a total of 83,82TB
We were wondering now how ceph would behave if a node would fail.
We have

Code:
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3

and

Code:
rule replicated_rule {
    id 0
    type replicated
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

so from what I understand, if one node goes down entirely, ceph wouldnt try to rebalance data but would go into degraded state, because min_size is 2, but size 3 cant be reached to the missing node, and the crush map would prevent ceph from trying to rebalance the data to a OSD on the other hosts, due to the step chooseleaf firstn 0 type host rule.
Am I correct about this?

and regarding capacities, at 95% capacity the cluster will stop allowing writing and reading, correct?
then I also have the milestones of 85% near full and 90% too full to backfill, does this mean I should try to always keep the capacity below 85% or 90% or 95%?

Thank you very much & Best Regards
 
There are several calculators online such as https://florian.ca/ceph-calculator/.

With 3 nodes and 1 off there is nowhere to replicate so yes you’d have 2 copies. If another OSD is then lost those PG would have only 1 copy and I/O would stop because <2.

With 3 nodes if one OSD is down those PG would be moved to remaining OSD on the same node. If your 27T is 3x9T then the contents of the 9T drive need to fit on the other two.

You should be below near full but also allow for as many failures as you want to allow.
 
  • Like
Reactions: gurubert
Our Ceph Cluster consists of three nodes that each contribute a capacity of 27,94TB, so we reach a total of 83,82TB
We were wondering now how ceph would behave if a node would fail.

Each of the 3 Nodes has a full copy of all data.
So your real usable space is somewhere below 27TB - you already mentioned the %-values. Probably a good idea to play it safe and stay under 70% in case a sudden spike happens, like unplanned data growth.

If one node fails, nothing happens regarding Ceph - no re-replication or something (the missing third copy will not be placed on the remaining two hosts as they already have a copy).
Once the failed host comes back up, there will be a re-sync from the 2 remaining hosts to get the failed host to the current state.


One thing to keep in mind regarding single disk failures:
If you use for example 2x15,36TB Disks (which comes out to around 2*14TB usable):
If one disk dies, then CEPH will try to rebuild the failed disk (the PG/Data in there) on the one remaining disk in that same node. This will completely fill up your one remaining 14TB disk if you are using ~50% of the total space!
So using just 2 Disks in a Node is a very bad setup, it's better to use more smaller disks.
The same problem then still applies if you use more disks, but the discrepancy becomes smaller.