[SOLVED] 1 pools have many more objects per pg than average

jsterr · Oct 20, 2021

Code:

root@XYZ:~# ceph health
HEALTH_WARN 1 pools have many more objects per pg than average

Code:

1 pools have many more objects per pg than average
pool vm_nvme objects per pg (423) is more than 12.0857 times cluster average (35)

This happend after trying recreating the ceph pool with fewer osds then I had before ( I did out/down on 4 osds per node -> removed 12 osds in total) Proxmox GUI didnt let me create a pool with less osds up/online. I started all my osds so 24 in total were up and online again. I recreated the pool without error. I didnt recreate the osds though.

As soon as I write something on the pool with a rados bench, this warning occurs now and it does not disappear. I also tried messing around with pg-autoscale, but nothing changed. If i delete data with rados cleanup -p vm_nvme error disappears, after writing to the pool again it reappers.

How can I fix this? Why does this occur now, never seen this before.

mira · Oct 20, 2021

Ceph 16.2.6 with `scale-down` autoscale profile?

It seems that the difference between number of PGs and optimal number of PGs is too little (factor 2) for the autoscaler to change it, so the pool vm_nvme has only 128 PGs even though it should have 256 which leads to more objects per PG than for other pools.

Did you set the target ratio accordingly? For device_health_metrics you could disable the autoscaler and set the # PGs to 1. Scaling of the device_health_metrics pool seems to be a bug which will be fixed in one of the next versions.

jsterr · Oct 20, 2021

mira said:
Ceph 16.2.6 with `scale-down` autoscale profile?

Yes correct, default option (did not change anything on that)

mira said:
Did you set the target ratio accordingly? For device_health_metrics you could disable the autoscaler and set the # PGs to 1. Scaling of the device_health_metrics pool seems to be a bug which will be fixed in one of the next versions.

Regarding health_metrics: Ill do that, and report if that changed something.

No I did not set a ratio, is this needed ? (never had to do that before even on 16.2)

Can u give a example on how to set it so which value makes sense? I only have vm_nvme for vms and ceph_fs pool for ISO, Templates. 90% of data will be stored on vm_nvme. So ratio for 0.9 for vm_nvme, but what about cephfs and cephfs_metadata?

Edit: I set vm_nvme to 0.9 and cephfs pools to 0.1 each. After that ceph complained about uneven count of pgs (cephfs had 125 PG instead of 128 or 64 or 32 ) I turned of autoscale, set pgs to 128) the error disappeared. Everything green now! Thanks!!

mira · Oct 20, 2021

You can find the settings under `Advanced` when editing a pool. If ~90% of the data will be stored on vm_nvme, then set the target ratio to 0.9.
Setting a target ratio helps the autoscaler with balancing the PG count on each OSD. As it doesn't have any info on how much usage each pool has, it just gives the first pool more PGs and scales it down afterwards if other pools need more based on usage.

Search

Search

[SOLVED] 1 pools have many more objects per pg than average

jsterr

Renowned Member

mira

Proxmox Staff Member

jsterr

Renowned Member

mira

Proxmox Staff Member

We value your privacy