Hi,Perhaps i have wrong idea about the minimum size. What is it exactly?
What i understand as of now this is the minimum size that cluster will acknowledge write operation. so with min size of 2 cluster ensures that there are always 2 replicas.
like you wrote: with min size 1 your cluster is operable with two failed OSDs. "min size 2" can't ensures that there are always 2 replicas - it's mean only, that you can't write to an cluster with two failed disks.
This mean a third OSD fail - so you have an big data-lost!With min size 1 if a single copy dies somehow as in my case, there are no more replica left to rebuild.
If an OSD fail, this OSD will removed (or better weigh 0) from the crush map. E.G. all primary PGs (and after that - also the secondary) was rebuilded on one remaining OSD like the new crushmap calculate. For the ex-primarys an secondary PG is taken to rebuild the new primary. If now another OSD fail, the crushmap will be recalculate and the play begin from start.
min size 2 only prevents writes (also changes) during an rebuild of an cluster with two failed disks. Depends how big are your disks and how fast your recovery is that you need (or think to need) that.
but then you should also not use "osd max backfills = 1 + osd recovery max active = 1" because this slow down the recovery-process!
Perhaps it's an better idea to monitor the cluster and if the second OSD fails, switch back max_backfills and max_active to the normal values.
In this case the VMs are running, but very slow - so it's better than "min size 1" (nothing run) and a rebuild will be much faster!
AFAIK, there are the config online (or parts of it) from very big ceph-installations (CERN).I am thinking of couple other clusters we are managing which are destined to grow upto petabyte. With so many osds and nodes min size greater than 1 makes sense. They are sitting on ubuntu but in the plan to move to proxmox.
Udo
Last edited: