[SOLVED] Ceph storage fully although storage is still available

adminkc · Dec 5, 2022

Hi,

our Ceph Storage shows Storage almost full, but when I look at ceph it shows that we have 50 tb more left.
How is this possible?
Does someone other have this issue or an idea how to fix that?

BR,
KC- IT Team

finish06 · Dec 5, 2022

adminkc said:
Hi,

our Ceph Storage shows Storage almost full, but when I look at ceph it shows that we have 50 tb more left.
How is this possible?
Does someone other have this issue or an idea how to fix that?

BR,
KC- IT Team

Check the OSD ratios. Is one OSD holding more data compared to the others? This can be seen in the Ceph > OSD page, Used (%).

If one of the OSDs is holding a higher percentage compared to others, you can manually set the reweigh via the command line to a lower value to help force a redistribution.

adminkc · Dec 5, 2022

It shows me that the OSD are not holding the same data one holding more one holding less.
screenshot attached.
the first one ist with 69,35%.

finish06 · Dec 5, 2022

adminkc said:
It shows me that the OSD are not holding the same data one holding more one holding less.
screenshot attached.
the first one ist with 69,35%.

Can you please explain how you have your ceph cluster set up? It is unconventional to have all SSDs on one host and all HDDs on another host.

adminkc · Dec 5, 2022

We have a Cluster with 13 Nodes.
We create a Ceph cluster mixed with SSD & HDD's and our Cluster have 5 Monitors and 3 Manager.

aaron · Dec 5, 2022

Can you post the output of ceph osd df tree? Best inside [code][/code] tags.
As well as ceph balancer status?

adminkc · Dec 5, 2022

Attached the osd tree and balancer

aaron · Dec 5, 2022

Well, some of the SSD OSDs are quite full, therefore the pool using SSDs (I assume you have device class specific rules), sees almost no free space as the fullest OSD is the limit. One thing I noticed is that the OSDs could have more PGs. That might also make it easier for the balancer to actively redistribute some PGs to use the space of the OSDs better.

Do you have any target_size or target_ratios defined? If not, please do so. This helps the autoscaler to know the estimated size usage of the pools and calculate their PG number accordingly to get to a target of about 100 PGs per OSD. The result should be more PGs, and therefore smaller PGs which help to distribute the stored data more evenly.

Ideally, you will set target_ratios as those are weighted against the other pools (in the same device class). The autoscaler will change the pg_num of a pool automatically if the optimal PG number is off from the current one by a factor of 3 or more. If it is smaller, you have to set it manually to the optimal value.

If you have device specific rules, make sure that each pool is assigned to one and that no pool is using the default "replicated_rule" anymore. It does not make a distinction regarding device class and will confuse the autoscaler.

adminkc · Dec 5, 2022

so you mean this rules in the picture attached.
This rules we got when we create the cluster.
Do you have maybe some article how to setup rules and make new one?

aaron · Dec 5, 2022

Hmm okay. So if you edit the pool and enable the "Advanced" checkbox next to the OK buttons, do you have a target_ratio set? If not, please do so, a "1" should be plenty fine. The .mgr pool can be ignored as it will not take ip any significant amount of space.

If the pool does not have any target_ratio set, then the autoscaler can only take the currently used space of the pool. I assume that it will recommend double the current number of PGs.

Looking through the list of OSDs more closely, I realize that you only have 2 Nodes that contain HDD OSDs and 5 with SSDs (correct me if I am wrong).

The number of OSDs varies quite a bit between the nodes. This means, that some nodes get quite a bit more traffic as they store more data than others. This can be seen in the "weight" column of the ceph osd df tree output.

The idea is, to create two rules, one targeting HDDs, one SSDs and assign each rule to one pool. Then you would have a fast SSD pool and a slow HDD pool and can place the disk images as you need.
Since only two nodes contain HDD OSDs, it doesn't really make sense to create two pools, as you would usually want to have a size/min_size of 3/2. But that means, that you would need at least 3 nodes with OSDs of that device class.

So for now, you will see a mixed bag regarding the performance of the cluster, depending on where the data you want to access is stored.

Regarding the full problem: Please set a target ratio for the pool "ceph-vm" and if the autoscaler then recommends 2048 PGs, set it to that. It won't do it automatically as it is only a change by a factor of 2.

This might help already to redistribute the data more evenly.
Additionally, keep an eye on the balancer. If it is done, you should see it returning this:

Code:

"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",

adminkc · Dec 5, 2022

I start to increase the PG Number on 2048 and he do not give me more space?
How can I use the 50TB left, does a solution exist?
The increase took me more space and the Ceph Storage i nearlly full..... (Used more than 90%)

aaron · Dec 6, 2022

Can you post the current output of ceph osd df tree?
Please also provide the output of pveceph pool ls --noborder. Please make sure that the window is wide enough, as the command will just cut off anything that doesn't fit.

Please also post the output of pveversion -v, or if you click on a node -> summary -> Package versions (top left)

adminkc · Dec 6, 2022

I start the increase on 2048 yesterday 8 pm an he ist now on 1740 like you see.

aaron · Dec 6, 2022

From what I can see, osd.6 is the one which is fullest at the moment (~85%), but not as full as yesterday (~87%). I assume that Ceph is currently rebalancing quite some data.
How is the situation in the meantime. Do you see a bit more space, or is it still decreasing?

The whole increasing of the PGs and rebalancing the data can take some time.

Once the cluster has scaled the PGs to 2048 and is done rebalancing, check the usage again and if there are OSDs that are considerably higher than others.
The status of the balancer would be interesting as well, as it should then see where it can improve the data distribution leading to more evenly used OSDs.

adminkc · Dec 6, 2022

The % goes down because i move some disks from Ceph to zfs. But it is still not clear that the 50 tb can't be used.

aaron · Dec 6, 2022

adminkc said:
The % goes down because i move some disks from Ceph to zfs.

Okay, that would have been useful information

adminkc said:
But it is still not clear that the 50 tb can't be used.

As long as you have a single OSD that is much fuller than the others, it will be the limiting factor for how much space is estimated as free.

That is why an increase in the PG_num should result in a better distribution and therefore that one full OSD, and other also rather full OSDs, should have more free space. But until then it can take some time.

Out of curiosity, can you post the output of ceph -s and are there any Global flags enabled? (OSD panel)

adminkc · Dec 6, 2022

here are the ceph -s attached

adminkc · Dec 6, 2022

so nothing changed, the Storage is still the same, any other Ideas maybe?
Should we go back to ZFS If this does not work like It should?

BR,
KC IT-Team

aaron · Dec 7, 2022

Hmm, is the recovery done?
Could you post ceph -s, ceph osd df tree and ceph balancer status again?

Having the disks spread out that unevenly among the cluster nodes might be an issue that could prevent Ceph from balancing the data more evenly. But honestly, there are enough nodes in the cluster that it should work out okayish.

adminkc · Dec 7, 2022

Yees the recovery is done, he shows now 2048.

Attached he commands

[SOLVED] Ceph storage fully although storage is still available

Member

Attachments

Renowned Member

Member

Attachments

Renowned Member

Member

Proxmox Staff Member

Member

Attachments

Proxmox Staff Member

Member

Attachments

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Attachments

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Attachments

Member

Proxmox Staff Member

Member

Attachments