[TUTORIAL] FabU: can I use Ceph in a _very_ small cluster?

Maybe this is something which might be added as a link or of interest for some of you: CERN ETH Zürich did a talk on benchmarking Ceph and it's difficulties in preparing for the real world on this years fosdem in Brussel:
https://fosdem.org/2025/schedule/ev...etic-benchmarks-to-real-world-user-workloads/
There were also some other talks on Ceph (although I didn't manage to get a seat, the room was quite small unfortunely): https://fosdem.org/2025/schedule/room/k3401/


Edit: Wasn't CERN but ETH Zürich, I mixed them up (due to both institutions being in Switzerland I suspect)
 
Last edited:
link or of interest
Thanks, I've added it. And I've downloaded but not seen it yet. All in all there seem to be 793 recordings from Fosdem - and Ceph is not the only interesting topic... :-)
 
  • Like
Reactions: Johannes S
That means that you should already account on max capacity= (number of Nodes-1)*(osd capacity/node *.08) / nodes.
as a consequence of a different thread I need to caveat this.

The above is only true IF each node has the SAME OSD CAPACITY, and the pool rule is replication:3. ACTUAL pool capacity would be 3x capacity of the node with the smallest capacity in case of 3 osd nodes, or total osd capacity/3. the 80% high water mark needs to be observed PER OSD, which in practical terms means a lower amount for the pool because OSD distribution will usually have 5-10% variance in a given node (the mode osd's per node the less the variance.)
 
  • Like
Reactions: Johannes S
the 80% high water mark needs to be observed PER OSD,
If I remember correctly my six node Ceph was clever enough to distribute data in a way that the smaller OSDs were assigned less data - without any manual tuning. A small OSD would get 80% filled at (nearly) the same time as a large one.

One of my core statements was and is "you need some more nodes and several more OSDs than the minimum" for a good experience :-)

Disclaimer, as I've already stated: I've currently dropped my Ceph setup. I can't verify what I say now...
 
If I remember correctly my six node Ceph was clever enough to distribute data in a way that the smaller OSDs were assigned less data - without any manual tuning. A small OSD would get 80% filled at (nearly) the same time as a large one.
So here is the thing about that.

the algorithm will distribute pg's best it can according to the rules, but the SIZE of the pgs is a function of the pool total pg count. the larger then pgs, the more difficult it is to shoehorn them evenly. You may be then tempted to deploy a large number of PGs by default so they end up smaller- which has the cost of potentially reduced performance and increased latency. Everything is a tradeoff.