a new 3 node Cluster with Ceph Pool.
@waltar sounds a little bit pessimistic. ;-)
Ceph is great, but it needs some resources
above the theoretical minimum to work reliable. I would like to add this:
You plan to have three nodes. That is the absolute minimum for a cluster. Probably Ceph works with the default settings "size=3/min_size=2". (*Never* go below that!)
The first problem: if each node has only one single OSD: when (not: if) one device
or one node fails Ceph is immediately degraded. There is no room for Ceph to heal itself, so "degraded" is permanent. For a stable situation you really want to have nodes that can jump in and return to a stable condition - automatically. For Ceph
in this picture this means to have
at least *four* nodes. (In this specific aspect; in other regards you really want to have five or more...)
Another detail often forgotten: let's say you have those three nodes with
two OSD each. When one OSD fails its direct neighbor will need to take over the data from the dead disk. (That lost data can not given to another node - the only two other nodes already have a copy!) This means you can fill all OSD in this picture only up to 45 percent: the original 45% plus the "other" 45% gets you 90% on this surviving OSD. To avoid this problem you want
several OSDs per node or - better! - more than three nodes.
Note that Ceph is more critical for a cluster than a local SSD is for one of the three nodes: when Ceph goes readonly *all VMs in the whole cluster* will stop immediately - they can not write any data (including log messages, which is practically *always* done) and will stall.
Network: note that data-to-be-written will go over the wire multiple times before it is considered "written". A fast network is a must. This means 10 GBit/s should be considered the minimum. But yeah, technically it works with slower speeds. At first, with low load. When high usage leads to congestion it will increase the latency and you will encounter "strange" errors, which may be hard to debug.
Regarding SSDs/NVMe: you probably already know the recommendation regarding Enterprise class devices. Those have reasons (plural), please consider them. (If you go for -let's say- seven nodes with five OSD each the quality of the OSDs (in a homelab) may be lower, but with a bare minimum number of disks they really need to be high quality.)
YMMV! And while I am experimenting with Ceph I am
not an expert...