Ceph RBD Storage Shrinking Over Time – From 10TB up to 8.59TB

jorcrox · 2025-03-05T13:43:48+0100

I have a cluster with three Proxmox servers connected via Ceph. Since the beginning, the effective storage was 10TB, but over time, it has decreased to 8.59TB, and I don’t know why. The filesystem is RBD. Why is my Ceph RBD storage shrinking? How can I reclaim lost space?

gurubert · 2025-03-05T21:29:20+0100

Are there other pools using the same OSDs?

ness1602 · 2025-03-06T08:11:50+0100

And maybe you are using cephfs to store something

smueller · 2025-03-06T08:24:57+0100

jorcrox said:
I have a cluster with three Proxmox servers connected via Ceph. Since the beginning, the effective storage was 10TB, but over time, it has decreased to 8.59TB, and I don’t know why. The filesystem is RBD. Why is my Ceph RBD storage shrinking? How can I reclaim lost space?

View attachment 83286

Could you share us your OSDs Config, Crush Map and your Configured Pools in Ceph?

gurubert · 2025-03-06T08:34:07+0100

smueller said:
your Configured Pools in Ceph?

Please post the output of ceph df.

jorcrox · 2025-03-06T08:50:30+0100

this the ceph df

jorcrox · 2025-03-06T08:55:28+0100

smueller said:
Could you share us your OSDs Config, Crush Map and your Configured Pools in Ceph?

this is Config, Crush Map and config

jorcrox · 2025-03-06T08:57:11+0100

gurubert said:
Are there other pools using the same OSDs?

No there aren't

jorcrox · 2025-03-06T08:58:08+0100

gurubert said:
Please post the output of ceph df.

smueller · 2025-03-06T09:01:49+0100

jorcrox said:
this is Config, Crush Map and config

Why do you have set a max size of 10 and a min size of 1?

I only saw 3 Nodes, so the max size should be 3 and min size should be 2.

Code:

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

gurubert · 2025-03-06T09:03:16+0100

Why are there 8TB data stored in device_health_metrics?

Do you store RBD data in this pool? You should create a separate pool for RBD data.

https://pve.proxmox.com/wiki/Ceph_P...ages_are_stored_on_pool_device_health_metrics

gurubert · 2025-03-06T09:03:46+0100

smueller said:
Why do you have set a max size of 10 and a min size of 1?

This is for the rule, it is not the size of the pool.

jorcrox · 2025-03-06T10:20:21+0100

An external company configured our server, and I don’t fully understand the settings they applied. Now, I am facing a major issue because the storage is almost full, and I don’t know how to fix it. I need guidance on how to resolve this problem. Is there any way to address it?"

smueller · 2025-03-06T10:51:33+0100

jorcrox said:
An external company configured our server, and I don’t fully understand the settings they applied. Now, I am facing a major issue because the storage is almost full, and I don’t know how to fix it. I need guidance on how to resolve this problem. Is there any way to address it?"

Have you tried contacting the Company that installed the servers for you?

jorcrox · 2025-03-06T11:20:42+0100

smueller said:
Have you tried contacting the Company that installed the servers for you?

I am now in charge of our server, and I am looking for a solution to this issue. However, I don’t know how to fix it. Any guidance would be greatly appreciated.

aaron · 2025-03-06T11:47:12+0100

Since the "device_health_metrics" pool is present, you must be on older versions of Ceph & Proxmox VE.

Before you upgrade, create a new pool and move all the disk of the VMs to it. If you check the configuration of the "CephPool01" storage in Datacenter -> Storage, it will most likely show that the "device_health_metrics" pool is used.

It was possible to use it for disk image storage due to not enough safety checks in older versions of Proxmox VE.
Only once you have moved all disk to the new pool, can you consider upgrading.
The reason is that in newer Ceph versions, this pool gets renamed to ".mgr" and will break the disk images in the process!

The reason why you see the overall space go down is because some OSDs are quite a bit fuller than others. Check if the balancer is enabled.

Code:

ceph balancer status

It will make sure that OSDs are used more evenly by moving PGs around in cases where the CRUSH algorithm alone is not able to provide a good even distribution of the data across OSDs.

You should be at max on Pacific (v16). The Balancer docs for this version can be found at https://docs.ceph.com/en/pacific/rados/operations/balancer/

jorcrox · 2025-03-06T13:48:08+0100

aaron said:
Since the "device_health_metrics" pool is present, you must be on older versions of Ceph & Proxmox VE.

Before you upgrade, create a new pool and move all the disk of the VMs to it. If you check the configuration of the "CephPool01" storage in Datacenter -> Storage, it will most likely show that the "device_health_metrics" pool is used.

It was possible to use it for disk image storage due to not enough safety checks in older versions of Proxmox VE.
Only once you have moved all disk to the new pool, can you consider upgrading.
The reason is that in newer Ceph versions, this pool gets renamed to ".mgr" and will break the disk images in the process!

The reason why you see the overall space go down is because some OSDs are quite a bit fuller than others. Check if the balancer is enabled.

Code:

ceph balancer status

It will make sure that OSDs are used more evenly by moving PGs around in cases where the CRUSH algorithm alone is not able to provide a good even distribution of the data across OSDs.

You should be at max on Pacific (v16). The Balancer docs for this version can be found at https://docs.ceph.com/en/pacific/rados/operations/balancer/

What are the repercussions of activating the balance? How long will it last? I need to consider that these nodes are in production.

Ceph RBD Storage Shrinking Over Time – From 10TB up to 8.59TB

Member

Distinguished Member

Famous Member

Member

Distinguished Member

Member

Attachments

Member

Attachments

Member

Member

Attachments

Member

Distinguished Member

Distinguished Member

Member

Member

Member

Proxmox Staff Member

Member

Attachments

We value your privacy