Ceph Storage question

KeyzerSuze

Member
Aug 16, 2024
30
1
8
Hi

This seems to be the best place / forum to ask question about ceph :)

My understanding of ceph is the underlying storage is OSD, these are distributed between nodes.
Pools are then created that sit on top of OSD's ... i think OSD's are broken into PG and PG are assigned to Pools I think ..
I think with a crush map you can say which osd's belong to which pool.

Already getting complicated. lets presume I have the stock standard crush map/rule.

standard replication rule is 3 copies , so for each block of info (or pg ?) there are 3 copies - not on the same OSD and truing to not be on the same host. sounds all good.

I have my proxmox cluster with ceph installed, i have 6 nodes - but 4 of them are old machine and I have 1 large newer machine .. so I want to get to 3 nodes of proxmox (the 3 nodes all have 10G)

I also have a large amount of data , this is split into media created by me /family - photo's video ets probably say 1T not that much
then I have dvd /blu ray rips which i have online - so say roughly 15T .

I currently have all of this on cephfs .

Cephfs this is filesystem thats sits on top of 2 pools. So ceph provides RDB (?) which proxmox can talk to directly and also provide cephfs

So my 1 large cephfs has 18T of data so 54T of space used (3x). what I would like to do is create a new cephfs just for the large media and just make it x2 ..... to get some space back

My concern if I have a cephfs called cfs_large - with 2 replica min (instead of 3), what happens if I lose 2 OSD's that happen to be the mirrors of each other. do I lose all of the cephfs or do I just lose what was on those OSD. if the chance is I might lose all of the 17T ... then probably best to stick to 3x

Then the follow up question whats happens to the RDB / the pool if I lose 3 osd that constitutes a mirror set. is the whole pool lost ?

should I be creating smaller pools instead of 1 big pool ??

Sorry got the long winded way to get there - hope it makes sense
 
You will only lose the affected PGs and their objects. This will lead to corrupted files (when the data pool is affected) or a corrupted filesystem (if the metadata pool is affected). Depending on which directory is corrupted you may not be able to access a large part of the CephFS any more. Metadata corruption may also lead to an unstartable MDS.
So better use size=3 on the metadata pool.

If an RBD pool misses PGs some parts of the VM images produce IO errors inside the VM which may lead to any kind of crashes.

Conclusion: Do not use size=2 pools for any data of value.
 
Also of note you’d have to lose the 2 OSD at the same time…after one drops Ceph will immediately copy those PG to other OSDs. On the same node if you have only 3.

This also means you need the capacity to handle that.
 
  • Like
Reactions: alexskysilk
Then the follow up question whats happens to the RDB / the pool if I lose 3 osd that constitutes a mirror set. is the whole pool lost ?
no. if you lose three disks on three separate nodes AT THE SAME TIME, the pool will become read only and you'll lose all payload that had placement group with shards on ALL THREE of those OSDs.

BUT here's the thing- the odds of that happening are astronomically low which is the whole point. And if disks don't fail at the same time, the subsystem has time to rebalance anyway. This isnt something you really need to worry about, although the lesson here is make sure you have more than three OSD nodes and capacity available to rebuild/rebalance.

As for "small" and "big" pools- not really sure what you mean by that. The above applies to all pools regardless of number of PGs or content.
 
  • Like
Reactions: gurubert
On the same node if you have only 3.
Does it?

With the failure domain being "host" this does not make sense...? I am definitely NOT a Ceph expert, but now I am interested in the actual behavior:

I have a small, virtual Test-Cluster with Ceph. For the following tests three Nodes have two OSDs each. Please ignore the "emtpy" ones, I prefer NOT to tamper with copy-n-pasted text as this may introduce stupid errors.

There is only one newly created pool "data" with the usual "size=3,min_size=2 replicated_rule".

The artificial "workload" is a Debian VM, just being a "dummy" to check the behavior from inside a VM. With six OSDs "In" the situation of the OSDs is this:
Code:
root@pna:~# ceph osd df tree
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         0.18713         -  192 GiB   23 GiB   20 GiB  70 KiB  3.3 GiB  168 GiB  12.23  1.00    -          root default
 -3               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pna
 -5               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnb
 -7               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnc
 -9         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.23  1.00    -              host pnd
  3    hdd  0.03119   1.00000   32 GiB  3.7 GiB  3.2 GiB  12 KiB  548 MiB   28 GiB  11.67  0.95   15      up          osd.3
  9    hdd  0.03119   1.00000   32 GiB  4.1 GiB  3.5 GiB  10 KiB  576 MiB   28 GiB  12.79  1.05   18      up          osd.9
-11         0.06238         -   64 GiB  7.9 GiB  6.7 GiB  26 KiB  1.1 GiB   56 GiB  12.27  1.00    -              host pne
  4    hdd  0.03119   1.00000   32 GiB  3.0 GiB  2.4 GiB  13 KiB  581 MiB   29 GiB   9.25  0.76   11      up          osd.4
 10    hdd  0.03119   1.00000   32 GiB  4.9 GiB  4.3 GiB  13 KiB  566 MiB   27 GiB  15.29  1.25   22      up          osd.10
-13         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.19  1.00    -              host pnf
  5    hdd  0.03119   1.00000   32 GiB  3.1 GiB  2.6 GiB  11 KiB  569 MiB   29 GiB   9.77  0.80   14      up          osd.5
 11    hdd  0.03119   1.00000   32 GiB  4.7 GiB  4.2 GiB  11 KiB  526 MiB   27 GiB  14.61  1.19   19      up          osd.11
                        TOTAL  192 GiB   23 GiB   20 GiB  74 KiB  3.3 GiB  168 GiB  12.23

It is obvious that my data is near 8 GB and each node stores this very amount of data.

18:00 - now I remove BOTH OSD on ONE node: osd.4 + osd.10 --> OUT - this is visible via CLI immediately:
Code:
root@pna:~# ceph osd df tree
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         0.18713         -  128 GiB   16 GiB   13 GiB  44 KiB  2.2 GiB  112 GiB  12.21  1.00    -          root default
 -3               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pna
 -5               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnb
 -7               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnc
 -9         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.23  1.00    -              host pnd
  3    hdd  0.03119   1.00000   32 GiB  3.7 GiB  3.2 GiB  12 KiB  548 MiB   28 GiB  11.67  0.96   15      up          osd.3
  9    hdd  0.03119   1.00000   32 GiB  4.1 GiB  3.5 GiB  10 KiB  576 MiB   28 GiB  12.79  1.05   18      up          osd.9
-11         0.06238         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pne
  4    hdd  0.03119         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0   11      up          osd.4
 10    hdd  0.03119         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0   21      up          osd.10
-13         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.19  1.00    -              host pnf
  5    hdd  0.03119   1.00000   32 GiB  3.1 GiB  2.6 GiB  11 KiB  569 MiB   29 GiB   9.77  0.80   14      up          osd.5
 11    hdd  0.03119   1.00000   32 GiB  4.7 GiB  4.2 GiB  11 KiB  526 MiB   27 GiB  14.61  1.20   19      up          osd.11
                        TOTAL  128 GiB   16 GiB   13 GiB  47 KiB  2.2 GiB  112 GiB  12.21

The VM does not notice this. The filesystem is still writeable because of "min_size=2". Now we wait ten(?) minutes or more...

More than 20 minutes later nothing has changed:

Code:
root@pna:~# date; ceph osd df tree
Sat Aug 30 06:24:22 PM CEST 2025
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         0.18713         -  128 GiB   16 GiB   13 GiB  44 KiB  2.2 GiB  112 GiB  12.23  1.00    -          root default
 -3               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pna
 -5               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnb
 -7               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnc
 -9         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.25  1.00    -              host pnd
  3    hdd  0.03119   1.00000   32 GiB  3.7 GiB  3.2 GiB  12 KiB  548 MiB   28 GiB  11.68  0.96   15      up          osd.3
  9    hdd  0.03119   1.00000   32 GiB  4.1 GiB  3.5 GiB  10 KiB  576 MiB   28 GiB  12.81  1.05   18      up          osd.9
-11         0.06238         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pne
  4    hdd  0.03119         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0   12      up          osd.4
 10    hdd  0.03119         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0   21      up          osd.10
-13         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.22  1.00    -              host pnf
  5    hdd  0.03119   1.00000   32 GiB  3.1 GiB  2.6 GiB  11 KiB  573 MiB   29 GiB   9.80  0.80   14      up          osd.5
 11    hdd  0.03119   1.00000   32 GiB  4.7 GiB  4.2 GiB  11 KiB  530 MiB   27 GiB  14.63  1.20   19      up          osd.11
                        TOTAL  128 GiB   16 GiB   13 GiB  47 KiB  2.2 GiB  112 GiB  12.23

So... the goal of "size=3" is NOT re-established as long no other/third Node does offer this capacity. The presence of four OSDs (or probably any other number) is not sufficient.

Can someone confirm the validity of my conclusion, please? Is "Out" okay for this test? Should I "destroy" instead...?

Edit: wait a minute (or 15) ... destroy in progress... After a minute:
Code:
root@pna:~# date; ceph osd df tree
Sat Aug 30 06:32:46 PM CEST 2025
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME   
 -1         0.12476         -  128 GiB   18 GiB   16 GiB  44 KiB  2.2 GiB  110 GiB  14.05  1.00    -          root default
 -3               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pna
 -5               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnb
 -7               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnc
 -9         0.06238         -   64 GiB  9.4 GiB  8.3 GiB  22 KiB  1.1 GiB   55 GiB  14.64  1.04    -              host pnd
  3    hdd  0.03119   1.00000   32 GiB  4.5 GiB  4.0 GiB  12 KiB  548 MiB   27 GiB  14.18  1.01   19      up          osd.3
  9    hdd  0.03119   1.00000   32 GiB  4.8 GiB  4.3 GiB  10 KiB  576 MiB   27 GiB  15.11  1.08   22      up          osd.9
-11               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pne
-13         0.06238         -   64 GiB  8.6 GiB  7.5 GiB  22 KiB  1.1 GiB   55 GiB  13.45  0.96    -              host pnf
  5    hdd  0.03119   1.00000   32 GiB  3.9 GiB  3.4 GiB  11 KiB  573 MiB   28 GiB  12.26  0.87   18      up          osd.5
 11    hdd  0.03119   1.00000   32 GiB  4.7 GiB  4.2 GiB  11 KiB  525 MiB   27 GiB  14.65  1.04   20      up          osd.11
                        TOTAL  128 GiB   18 GiB   16 GiB  47 KiB  2.2 GiB  110 GiB  14.05

Another "Edit" will be added in a short while... I have to pause this experiment for now...

Okay... two hours later:
Code:
root@pna:~# date; ceph osd df tree  
Sat Aug 30 08:38:04 PM CEST 2025
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME     
 -1         0.12476         -  128 GiB   18 GiB   16 GiB  44 KiB  2.2 GiB  110 GiB  14.07  1.00    -          root default  
 -3               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pna  
 -5               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnb  
 -7               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnc  
 -9         0.06238         -   64 GiB  9.4 GiB  8.3 GiB  22 KiB  1.1 GiB   55 GiB  14.65  1.04    -              host pnd  
  3    hdd  0.03119   1.00000   32 GiB  4.5 GiB  4.0 GiB  12 KiB  553 MiB   27 GiB  14.19  1.01   19      up          osd.3 
  9    hdd  0.03119   1.00000   32 GiB  4.8 GiB  4.3 GiB  10 KiB  576 MiB   27 GiB  15.11  1.07   22      up          osd.9 
-11               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pne  
-13         0.06238         -   64 GiB  8.6 GiB  7.5 GiB  22 KiB  1.1 GiB   55 GiB  13.48  0.96    -              host pnf  
  5    hdd  0.03119   1.00000   32 GiB  3.9 GiB  3.4 GiB  11 KiB  590 MiB   28 GiB  12.31  0.88   18      up          osd.5 
 11    hdd  0.03119   1.00000   32 GiB  4.7 GiB  4.2 GiB  11 KiB  525 MiB   27 GiB  14.65  1.04   20      up          osd.11
                        TOTAL  128 GiB   18 GiB   16 GiB  47 KiB  2.2 GiB  110 GiB  14.07


Disclaimer: not using Ceph currently, the above is just a test-cluster.
 
Last edited:
Does it?

With the failure domain being "host"
To be (much) clearer I was referencing 3 hosts and assuming multiple OSD on each, with at least one left running, not 3 hosts with only 1 OSD.

For the former, Ceph will use any other OSD on the same host (technically any unused host, but there are only 3 hosts so no choice left).

For the latter if the only OSD is down then there is no place to make a third copy and it is running on only 2. Since 2 is the (default) minimum this is fine but of course not ideal.

Your test was more like the latter example because you dropped two OSD on the same node, leaving zero options on that node. There's no point (ability?) to copy a PG to another OSD on an existing node because that node already has its copy of the PG.

With a default 3/2 setup Ceph wants to distribute a PG across 3 nodes. It doesn't care which nodes those are. In a setup with 10 nodes it will be spread out. With 3 nodes then any one node is expected to hold copies of all the data regardless of the number of OSDs available.

This is why having greatly unbalanced disk drives in a small cluster is not optimal because if the large drive fails, the remaining OSD need to hold the PGs. With a small amount of data this isn't an issue but as storage fills up the now-missing PG copies need to fit somewhere.
 
  • Like
Reactions: gurubert and UdoB
Yeah, my cluster ended up being unbalanced - waiting on drives ... once i had roughly the same number of drives with around the same size my cluster opened up more
 
Does it?

With the failure domain being "host" this does not make sense...? I am definitely NOT a Ceph expert, but now I am interested in the actual behavior:

I have a small, virtual Test-Cluster with Ceph. For the following tests three Nodes have two OSDs each. Please ignore the "emtpy" ones, I prefer NOT to tamper with copy-n-pasted text as this may introduce stupid errors.

There is only one newly created pool "data" with the usual "size=3,min_size=2 replicated_rule".

The artificial "workload" is a Debian VM, just being a "dummy" to check the behavior from inside a VM. With six OSDs "In" the situation of the OSDs is this:
Code:
root@pna:~# ceph osd df tree
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         0.18713         -  192 GiB   23 GiB   20 GiB  70 KiB  3.3 GiB  168 GiB  12.23  1.00    -          root default
 -3               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pna
 -5               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnb
 -7               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnc
 -9         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.23  1.00    -              host pnd
  3    hdd  0.03119   1.00000   32 GiB  3.7 GiB  3.2 GiB  12 KiB  548 MiB   28 GiB  11.67  0.95   15      up          osd.3
  9    hdd  0.03119   1.00000   32 GiB  4.1 GiB  3.5 GiB  10 KiB  576 MiB   28 GiB  12.79  1.05   18      up          osd.9
-11         0.06238         -   64 GiB  7.9 GiB  6.7 GiB  26 KiB  1.1 GiB   56 GiB  12.27  1.00    -              host pne
  4    hdd  0.03119   1.00000   32 GiB  3.0 GiB  2.4 GiB  13 KiB  581 MiB   29 GiB   9.25  0.76   11      up          osd.4
 10    hdd  0.03119   1.00000   32 GiB  4.9 GiB  4.3 GiB  13 KiB  566 MiB   27 GiB  15.29  1.25   22      up          osd.10
-13         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.19  1.00    -              host pnf
  5    hdd  0.03119   1.00000   32 GiB  3.1 GiB  2.6 GiB  11 KiB  569 MiB   29 GiB   9.77  0.80   14      up          osd.5
 11    hdd  0.03119   1.00000   32 GiB  4.7 GiB  4.2 GiB  11 KiB  526 MiB   27 GiB  14.61  1.19   19      up          osd.11
                        TOTAL  192 GiB   23 GiB   20 GiB  74 KiB  3.3 GiB  168 GiB  12.23

It is obvious that my data is near 8 GB and each node stores this very amount of data.

18:00 - now I remove BOTH OSD on ONE node: osd.4 + osd.10 --> OUT - this is visible via CLI immediately:
Code:
root@pna:~# ceph osd df tree
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         0.18713         -  128 GiB   16 GiB   13 GiB  44 KiB  2.2 GiB  112 GiB  12.21  1.00    -          root default
 -3               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pna
 -5               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnb
 -7               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnc
 -9         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.23  1.00    -              host pnd
  3    hdd  0.03119   1.00000   32 GiB  3.7 GiB  3.2 GiB  12 KiB  548 MiB   28 GiB  11.67  0.96   15      up          osd.3
  9    hdd  0.03119   1.00000   32 GiB  4.1 GiB  3.5 GiB  10 KiB  576 MiB   28 GiB  12.79  1.05   18      up          osd.9
-11         0.06238         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pne
  4    hdd  0.03119         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0   11      up          osd.4
 10    hdd  0.03119         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0   21      up          osd.10
-13         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.19  1.00    -              host pnf
  5    hdd  0.03119   1.00000   32 GiB  3.1 GiB  2.6 GiB  11 KiB  569 MiB   29 GiB   9.77  0.80   14      up          osd.5
 11    hdd  0.03119   1.00000   32 GiB  4.7 GiB  4.2 GiB  11 KiB  526 MiB   27 GiB  14.61  1.20   19      up          osd.11
                        TOTAL  128 GiB   16 GiB   13 GiB  47 KiB  2.2 GiB  112 GiB  12.21

The VM does not notice this. The filesystem is still writeable because of "min_size=2". Now we wait ten(?) minutes or more...

More than 20 minutes later nothing has changed:

Code:
root@pna:~# date; ceph osd df tree
Sat Aug 30 06:24:22 PM CEST 2025
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         0.18713         -  128 GiB   16 GiB   13 GiB  44 KiB  2.2 GiB  112 GiB  12.23  1.00    -          root default
 -3               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pna
 -5               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnb
 -7               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnc
 -9         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.25  1.00    -              host pnd
  3    hdd  0.03119   1.00000   32 GiB  3.7 GiB  3.2 GiB  12 KiB  548 MiB   28 GiB  11.68  0.96   15      up          osd.3
  9    hdd  0.03119   1.00000   32 GiB  4.1 GiB  3.5 GiB  10 KiB  576 MiB   28 GiB  12.81  1.05   18      up          osd.9
-11         0.06238         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pne
  4    hdd  0.03119         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0   12      up          osd.4
 10    hdd  0.03119         0      0 B      0 B      0 B     0 B      0 B      0 B      0     0   21      up          osd.10
-13         0.06238         -   64 GiB  7.8 GiB  6.7 GiB  22 KiB  1.1 GiB   56 GiB  12.22  1.00    -              host pnf
  5    hdd  0.03119   1.00000   32 GiB  3.1 GiB  2.6 GiB  11 KiB  573 MiB   29 GiB   9.80  0.80   14      up          osd.5
 11    hdd  0.03119   1.00000   32 GiB  4.7 GiB  4.2 GiB  11 KiB  530 MiB   27 GiB  14.63  1.20   19      up          osd.11
                        TOTAL  128 GiB   16 GiB   13 GiB  47 KiB  2.2 GiB  112 GiB  12.23

So... the goal of "size=3" is NOT re-established as long no other/third Node does offer this capacity. The presence of four OSDs (or probably any other number) is not sufficient.

Can someone confirm the validity of my conclusion, please? Is "Out" okay for this test? Should I "destroy" instead...?

Edit: wait a minute (or 15) ... destroy in progress... After a minute:
Code:
root@pna:~# date; ceph osd df tree
Sat Aug 30 06:32:46 PM CEST 2025
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME  
 -1         0.12476         -  128 GiB   18 GiB   16 GiB  44 KiB  2.2 GiB  110 GiB  14.05  1.00    -          root default
 -3               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pna
 -5               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnb
 -7               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnc
 -9         0.06238         -   64 GiB  9.4 GiB  8.3 GiB  22 KiB  1.1 GiB   55 GiB  14.64  1.04    -              host pnd
  3    hdd  0.03119   1.00000   32 GiB  4.5 GiB  4.0 GiB  12 KiB  548 MiB   27 GiB  14.18  1.01   19      up          osd.3
  9    hdd  0.03119   1.00000   32 GiB  4.8 GiB  4.3 GiB  10 KiB  576 MiB   27 GiB  15.11  1.08   22      up          osd.9
-11               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pne
-13         0.06238         -   64 GiB  8.6 GiB  7.5 GiB  22 KiB  1.1 GiB   55 GiB  13.45  0.96    -              host pnf
  5    hdd  0.03119   1.00000   32 GiB  3.9 GiB  3.4 GiB  11 KiB  573 MiB   28 GiB  12.26  0.87   18      up          osd.5
 11    hdd  0.03119   1.00000   32 GiB  4.7 GiB  4.2 GiB  11 KiB  525 MiB   27 GiB  14.65  1.04   20      up          osd.11
                        TOTAL  128 GiB   18 GiB   16 GiB  47 KiB  2.2 GiB  110 GiB  14.05

Another "Edit" will be added in a short while... I have to pause this experiment for now...

Okay... two hours later:
Code:
root@pna:~# date; ceph osd df tree 
Sat Aug 30 08:38:04 PM CEST 2025
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME    
 -1         0.12476         -  128 GiB   18 GiB   16 GiB  44 KiB  2.2 GiB  110 GiB  14.07  1.00    -          root default 
 -3               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pna 
 -5               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnb 
 -7               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pnc 
 -9         0.06238         -   64 GiB  9.4 GiB  8.3 GiB  22 KiB  1.1 GiB   55 GiB  14.65  1.04    -              host pnd 
  3    hdd  0.03119   1.00000   32 GiB  4.5 GiB  4.0 GiB  12 KiB  553 MiB   27 GiB  14.19  1.01   19      up          osd.3
  9    hdd  0.03119   1.00000   32 GiB  4.8 GiB  4.3 GiB  10 KiB  576 MiB   27 GiB  15.11  1.07   22      up          osd.9
-11               0         -      0 B      0 B      0 B     0 B      0 B      0 B      0     0    -              host pne 
-13         0.06238         -   64 GiB  8.6 GiB  7.5 GiB  22 KiB  1.1 GiB   55 GiB  13.48  0.96    -              host pnf 
  5    hdd  0.03119   1.00000   32 GiB  3.9 GiB  3.4 GiB  11 KiB  590 MiB   28 GiB  12.31  0.88   18      up          osd.5
 11    hdd  0.03119   1.00000   32 GiB  4.7 GiB  4.2 GiB  11 KiB  525 MiB   27 GiB  14.65  1.04   20      up          osd.11
                        TOTAL  128 GiB   18 GiB   16 GiB  47 KiB  2.2 GiB  110 GiB  14.07


Disclaimer: not using Ceph currently, the above is just a test-cluster.
thanks for such detailed input