CEPH Osd Full

Nixon Girard

Renowned Member
Mar 25, 2016
14
0
66
22
Sirs,

I'm having my first experience with CEPH.

I have 3 nodes with 2 disks each totaling 6 OSDS

My total storage reports that I am 80% occupied, but 1 of my OSDs reports that I am 94% occupied, generating a HEALTH_WARN status


1721126988124.png
Why this behavior and what should I do to balance the data in the OSDs?
 
At this point, the quickest way to temporarily free up space is to set the size of your pool(s) from 3 to 2. Keep the min_size at 2! This way you get some room to breathe, but should not lose any OSD or node.

Then it would be interesting to see some more infos.

How are the pools configured?
Code:
pveceph pool ls --noborder
In a window that is wide enough, otherwise output will be cut.
Code:
ceph balancer status

And you should rethink your cluster setup. In a 3-node cluster you should have either only 1 or at least 4 OSDs. Because if one of the OSDs fails, Ceph will try to recover the lost data to the remaining OSDs in the same node, which will also very quickly put you into a situation where there is not enough space.

With more nodes, this is not as problematic, as the other nodes can also take up some data. Or with more OSDs, so that the data of one lost OSD can be split over multiple OSDs in the same node.
 
Hi Aaron,
root@cephnode1:~# pveceph pool ls --noborder
Name Size Min Size PG Num min. PG Num Optimal PG Num PG Autoscale Mode PG Autoscale Target Size PG Autoscale Target Ratio Crush Rule Name %-Used Used
.mgr 3 2 1 1 1 on replicated_rule 2.00872318600887e-06 3354624
cephStrg 3 2 128 128 on replicated_rule 0.957875311374664 37974788415785




root@cephnode1:~# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.000584",
"last_optimize_started": "Tue Jul 16 10:30:41 2024",
"mode": "upmap",
"no_optimization_needed": true,
"optimize_result": "Too many objects (0.050333 > 0.050000) are misplaced; try again later",
"plans": []
}


I just used the initial settings recommended by the GUI

>>>And you should rethink your cluster setup. In a 3-node cluster you should have either only 1 or at least 4 OSDs. Because if one of the OSDs fails, Ceph will try to recover the lost data to the remaining OSDs in the same node, which will also very quickly put you into a situation where there is not enough space.

I need to delve deeper into this subject.
 
Just to understand, in each No I have 2 OSDs

If 1 OSD dies, won't it rebalance to the OSDS of the other Nodes? Will you try to rebalance to the other OSD on the same node?

In this case, would it be correct to create 2 pools? each pool with 1 OSD per NODE?
 
In the future, please post any CLI output inside [code][/code] tags, or use the formatting options of the editor.

I just used the initial settings recommended by the GUI
Which is fine, but it seems that the cluster got too full, and the data was not as equally distributed as would be good. It might have filled up faster than the balancer would have been able to equalize the distribution across the OSDs.

Therefore, right now, setting the "size" of the pool "cephStrg" to 2, will reduce the used space by about 1/3. This way, Ceph can rebalance the data, and ideally, you can remove some unused data before you set it back to 3 later.

In this case, would it be correct to create 2 pools? each pool with 1 OSD per NODE?
You would have to define a different device class for each of the 2 disks in a node and use device classes, and device class specific rules to make sure that a pool will only use one of the OSDs.

https://docs.ceph.com/en/latest/rados/operations/crush-map/#device-classes

Once you do that, also place the ".mgr" pool into a device class specific rule. No pool should still be using the "replicated_rule" once you use rules that are specific to device classes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!