Too few PGs per OSD (28 < min 30)

luigi · Feb 1, 2024

Hi,

On my proxmox installation I have the following ceph warning message:

too few PGs per OSD (28 < min 30)

How can I increase PGs per OSD?

Thanks
Luigi

sb-jw · Feb 1, 2024

You need to increase the number of PGs in your pools.

Tell us a few details about your CEPH, number of OSDs, pools, etc. pp. then we can tell you whether that makes sense or not.

luigi · Feb 1, 2024

Hi,

in picture attached you can see that we have two pool. One for SSD disks (for VMs) and another one for HDD disks (for archive). Each pool has 128 PGs.The SSD pool has in total 15 SSD disks (5 SSDdisks x 3 nodes).

The warning message is on SSD pool.

Thanks, Luigi

sb-jw · Feb 1, 2024

Um, so it now looks to me that your CEPH is in a very critical state and will soon go into read-only mode.

Please post the output of ceph osd df tree.

luigi · Feb 1, 2024

Hi,

in attached required informations.

Thanks, Luigi

sb-jw · Feb 1, 2024

Okay, doesn't look as bad as the first screenshot. But you have a big difference between some of the OSDs, you should fix that, because that also has a significant influence on your cluster fill level.

See here: https://docs.ceph.com/en/latest/rados/operations/balancer/

But you won't get rid of the message even with that. Given the level of SSDs, I think you could also go to 256 - 512 PGs. With the HDDs, 256 PGs won't be an issue either.

CEPH recommends no more than 100 PGs per OSD. But you shouldn't turn everything up now, it should always stay within limits, as the PGs can also have a significant influence on your performance. But also note that increasing the PGs initially puts a lot of load on the cluster as the data is redistributed. So you should do this pool by pool and activate the balancer beforehand so that you have a good starting position.

luigi · Feb 1, 2024

Okay.
I have proxmox 6.2.4 with ceph version 14.2.9.

Thanks, Luigi

aaron · Feb 1, 2024

luigi said:
Okay.
I have proxmox 6.2.4 with ceph version 14.2.9.

Thanks, Luigi

Update! Version specific knowledge for these old versions is fading away.
https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0
https://pve.proxmox.com/wiki/Upgrade_from_7_to_8

The guides for the Ceph upgrades are also linked in these guides. Once you run a recent version, set a target ratio for these pools so that the (new) autoscaler can determine the right pg_num for the pools. AFAICT you have one pool per device class. So any target ratio will result in the autoscaler calculating with the assumption that the pool will use the full space in its device class.

Once you update, you will see a new pool, first called "device_health_metrics" and once you are on Ceph Reef (17), it will be renamed to ".mgr". You need to assign a device class specific rule to this pool as well for the autoscaler to work.

luigi · Feb 1, 2024

Hi aaron,
I read the first link (upgrade from 6.xto7.0), there are a lot of steps in order to upgrade proxmox and ceph.

Do I need to shut down all VMs when I perform these steps?
Usually, how long time this upgrade takes?
If something goes wrong, can I restart quickly and turn on the VMs?

Thanks, Luigi

Daniel-Doggy · Feb 1, 2024

Maybe a bit of a hated opinion but...

I would recommend to just create a backup of all your VMs/Containers and store the backups on a external storage device.
Then do a wipe and clean install of Proxmox VE 8.

Given that your system is already 2 major versions behind, in my eyes it is safer and quicker (given that you need time for both the 6 to 7 and the 7 to 8 update) to just do a full VM/Container backup and then to a wipe and install Proxmox VE 8.

EDIT
I would say that it is also safer since you do not transfer over any version specified configurations that might or might not brake newer versions.
Even though it is (as far as I know) officially supported to update the OS between major updates, I always try to just clean install between major version. (And always do with Debian major version changes since this is also relevent in your case.)

aaron · Feb 1, 2024

luigi said:
Hi aaron,
I read the first link (upgrade from 6.xto7.0), there are a lot of steps in order to upgrade proxmox and ceph.

Do I need to shut down all VMs when I perform these steps?
Usually, how long time this upgrade takes?
If something goes wrong, can I restart quickly and turn on the VMs?

Thanks, Luigi

You can live migrate the VMs away, update one node, migrate back, continue with the next node. Keep in mind to follow each step exactly and don't skip versions.

When doing live migrations, keep in mind that within the same version, or from an older PVE to a newer, it should always work. Live migrating from a newer to an older though might not work all the time.

@Daniel_Dog has a point though, if you don't mind the downtime and can back up all VMs, starting with a completely fresh installation could be faster than doing 2 PVE major version upgrades and 4 Ceph version upgrades.

Too few PGs per OSD (28 < min 30)

luigi

Active Member

sb-jw

Famous Member

luigi

Active Member

Attachments

sb-jw

Famous Member

luigi

Active Member

Attachments

sb-jw

Famous Member

luigi

Active Member

aaron

Proxmox Staff Member

luigi

Active Member

Daniel-Doggy

Active Member

aaron

Proxmox Staff Member

We value your privacy