Ceph and diskspace

sahostking · Sep 29, 2021

I am reading on some posts that diskspace of disks are measure based on the smallest OSD disk in cluster?

So for example if we have the below on each node on 6 nodes.

2 x 2TB
1 x 500GB

Are we saying diskspace is lost? due to the 500GB.

Should we rather just remove the 500GB. We just had these intel dc s4500 lying around and thought using it would give us extra space when we setup the ceph cluster.

Ideas or correction please - thanks

aaron · Sep 30, 2021

I think there is a bit of a misunderstanding here.

Each OSD (disk) will be used to store data. Depending on the size it does get a weight assigned so that Ceph can distribute the data accordingly.

The statement, that the smallest OSD determines the available disk space, can be true in a specific case. Consider the smallest cluster possible, consisting of 3 nodes. Ceph achieves redundancy on the node level, making sure that the replicas (3 by default) are spread out over different nodes. In the case of a 3 node cluster, the node with the smallest OSDs actually determines how much space can be used.

Once you go to clusters with more nodes, the 3 replicas are distributed across them and not every node will store one of the 3 replicas anymore. Thus the calculation of how much space is usable get much more complicated, especially if you have nodes that use very different disks (sizewise) for the OSDs.

Ideally all your nodes will be very similar and using those smaller SSDs should be fine. The beauty of Ceph is, that if you come to the conclusion, that you don't want them anymore, you can remove them from the cluster. Just mark them as out first, wait until Ceph is done redistributing the data (Health OK) before you stop and destroy it

sahostking · Oct 8, 2021

I have been doing the following and had no issue as yet:

Stopped the OSD
Then clicked OUT

Then Destroyed the data. I didnt even consider to wait for it to show Health OK.

Did it multiple times with no issues. Seems Ceph can handle the order of things fine.

aaron · Oct 11, 2021

sahostking said:
I have been doing the following and had no issue as yet:

Stopped the OSD
Then clicked OUT

Then Destroyed the data. I didnt even consider to wait for it to show Health OK.

Did it multiple times with no issues. Seems Ceph can handle the order of things fine.

Only for a single OSD or multiple OSDs within a very short time? (minutes, seconds)
I accidentally did this a few times on test clusters and ended up with missing PGs and had to scrap the pool and recreate it to get back to a good state and lost the (not important) data on it.

sahostking · Oct 11, 2021

seconds and only did one OSD at a time. did it numerous times. like a lot. Still not one issue so far. And we host VMs on it hosting 1000s of cPanel accounts using over 5.7 TB of storage.

I think doing it for more than 1 OSD at a time may be super risky if it the one holds a copy of the other PG.

Search

Search

Ceph and diskspace

sahostking

Renowned Member

aaron

Proxmox Staff Member

sahostking

Renowned Member

aaron

Proxmox Staff Member

sahostking

Renowned Member

We value your privacy