[SOLVED] Ceph storage fully although storage is still available

adminkc

Member
Sep 28, 2020
91
0
11
28
Hi,

our Ceph Storage shows Storage almost full, but when I look at ceph it shows that we have 50 tb more left.
How is this possible?
Does someone other have this issue or an idea how to fix that?

BR,
KC- IT Team
 

Attachments

  • Ceph Storage voll.jpg
    Ceph Storage voll.jpg
    105.3 KB · Views: 45
  • ceph Storage free space.jpg
    ceph Storage free space.jpg
    32.4 KB · Views: 42
Hi,

our Ceph Storage shows Storage almost full, but when I look at ceph it shows that we have 50 tb more left.
How is this possible?
Does someone other have this issue or an idea how to fix that?

BR,
KC- IT Team
Check the OSD ratios. Is one OSD holding more data compared to the others? This can be seen in the Ceph > OSD page, Used (%).

If one of the OSDs is holding a higher percentage compared to others, you can manually set the reweigh via the command line to a lower value to help force a redistribution.
 
Last edited:
It shows me that the OSD are not holding the same data one holding more one holding less.
screenshot attached.
the first one ist with 69,35%.
 

Attachments

  • OSD% Use.jpg
    OSD% Use.jpg
    129.7 KB · Views: 57
It shows me that the OSD are not holding the same data one holding more one holding less.
screenshot attached.
the first one ist with 69,35%.
Can you please explain how you have your ceph cluster set up? It is unconventional to have all SSDs on one host and all HDDs on another host.
 
We have a Cluster with 13 Nodes.
We create a Ceph cluster mixed with SSD & HDD's and our Cluster have 5 Monitors and 3 Manager.
 
Can you post the output of ceph osd df tree? Best inside [code][/code] tags.
As well as ceph balancer status?
 
Attached the osd tree and balancer
 

Attachments

  • balancer Status.jpg
    balancer Status.jpg
    16.9 KB · Views: 35
  • Ceph OSD tree-2.jpg
    Ceph OSD tree-2.jpg
    171.3 KB · Views: 34
  • Ceph OSD tree-1.jpg
    Ceph OSD tree-1.jpg
    277 KB · Views: 40
Well, some of the SSD OSDs are quite full, therefore the pool using SSDs (I assume you have device class specific rules), sees almost no free space as the fullest OSD is the limit. One thing I noticed is that the OSDs could have more PGs. That might also make it easier for the balancer to actively redistribute some PGs to use the space of the OSDs better.

Do you have any target_size or target_ratios defined? If not, please do so. This helps the autoscaler to know the estimated size usage of the pools and calculate their PG number accordingly to get to a target of about 100 PGs per OSD. The result should be more PGs, and therefore smaller PGs which help to distribute the stored data more evenly.

Ideally, you will set target_ratios as those are weighted against the other pools (in the same device class). The autoscaler will change the pg_num of a pool automatically if the optimal PG number is off from the current one by a factor of 3 or more. If it is smaller, you have to set it manually to the optimal value.

If you have device specific rules, make sure that each pool is assigned to one and that no pool is using the default "replicated_rule" anymore. It does not make a distinction regarding device class and will confuse the autoscaler.
 
so you mean this rules in the picture attached.
This rules we got when we create the cluster.
Do you have maybe some article how to setup rules and make new one?
 

Attachments

  • ceph rule.jpg
    ceph rule.jpg
    27 KB · Views: 27
Hmm okay. So if you edit the pool and enable the "Advanced" checkbox next to the OK buttons, do you have a target_ratio set? If not, please do so, a "1" should be plenty fine. The .mgr pool can be ignored as it will not take ip any significant amount of space.

If the pool does not have any target_ratio set, then the autoscaler can only take the currently used space of the pool. I assume that it will recommend double the current number of PGs.

Looking through the list of OSDs more closely, I realize that you only have 2 Nodes that contain HDD OSDs and 5 with SSDs (correct me if I am wrong).

The number of OSDs varies quite a bit between the nodes. This means, that some nodes get quite a bit more traffic as they store more data than others. This can be seen in the "weight" column of the ceph osd df tree output.

The idea is, to create two rules, one targeting HDDs, one SSDs and assign each rule to one pool. Then you would have a fast SSD pool and a slow HDD pool and can place the disk images as you need.
Since only two nodes contain HDD OSDs, it doesn't really make sense to create two pools, as you would usually want to have a size/min_size of 3/2. But that means, that you would need at least 3 nodes with OSDs of that device class.

So for now, you will see a mixed bag regarding the performance of the cluster, depending on where the data you want to access is stored.


Regarding the full problem: Please set a target ratio for the pool "ceph-vm" and if the autoscaler then recommends 2048 PGs, set it to that. It won't do it automatically as it is only a change by a factor of 2.

This might help already to redistribute the data more evenly.
Additionally, keep an eye on the balancer. If it is done, you should see it returning this:
Code:
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
 
I start to increase the PG Number on 2048 and he do not give me more space?
How can I use the 50TB left, does a solution exist?
The increase took me more space and the Ceph Storage i nearlly full..... (Used more than 90%)
 
Can you post the current output of ceph osd df tree?
Please also provide the output of pveceph pool ls --noborder. Please make sure that the window is wide enough, as the command will just cut off anything that doesn't fit.

Please also post the output of pveversion -v, or if you click on a node -> summary -> Package versions (top left)
 
I start the increase on 2048 yesterday 8 pm an he ist now on 1740 like you see.
 

Attachments

  • osd tree.jpg
    osd tree.jpg
    339.5 KB · Views: 20
  • osd tree 1.jpg
    osd tree 1.jpg
    89.9 KB · Views: 23
  • noborder.jpg
    noborder.jpg
    33.9 KB · Views: 25
  • pveversion.jpg
    pveversion.jpg
    112.7 KB · Views: 21
  • pveversion1.jpg
    pveversion1.jpg
    22.1 KB · Views: 19
From what I can see, osd.6 is the one which is fullest at the moment (~85%), but not as full as yesterday (~87%). I assume that Ceph is currently rebalancing quite some data.
How is the situation in the meantime. Do you see a bit more space, or is it still decreasing?

The whole increasing of the PGs and rebalancing the data can take some time.

Once the cluster has scaled the PGs to 2048 and is done rebalancing, check the usage again and if there are OSDs that are considerably higher than others.
The status of the balancer would be interesting as well, as it should then see where it can improve the data distribution leading to more evenly used OSDs.
 
The % goes down because i move some disks from Ceph to zfs. But it is still not clear that the 50 tb can't be used.
 
The % goes down because i move some disks from Ceph to zfs.
Okay, that would have been useful information :)

But it is still not clear that the 50 tb can't be used.
As long as you have a single OSD that is much fuller than the others, it will be the limiting factor for how much space is estimated as free.

That is why an increase in the PG_num should result in a better distribution and therefore that one full OSD, and other also rather full OSDs, should have more free space. But until then it can take some time.

Out of curiosity, can you post the output of ceph -s and are there any Global flags enabled? (OSD panel)
 
so nothing changed, the Storage is still the same, any other Ideas maybe?
Should we go back to ZFS If this does not work like It should?

BR,
KC IT-Team
 
Hmm, is the recovery done?
Could you post ceph -s, ceph osd df tree and ceph balancer status again?

Having the disks spread out that unevenly among the cluster nodes might be an issue that could prevent Ceph from balancing the data more evenly. But honestly, there are enough nodes in the cluster that it should work out okayish.
 
Last edited:
Yees the recovery is done, he shows now 2048.

Attached he commands
 

Attachments

  • ceph balancer status.jpg
    ceph balancer status.jpg
    22.4 KB · Views: 20
  • ceph osd df tree2.jpg
    ceph osd df tree2.jpg
    109 KB · Views: 22
  • ceph osd df tree1.jpg
    ceph osd df tree1.jpg
    418.9 KB · Views: 19
  • ceph -s.jpg
    ceph -s.jpg
    45.1 KB · Views: 21

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!