PVE7 - Ceph 16.2.5 - Pools and number of PG

zeuxprox

Renowned Member
Dec 10, 2014
89
5
73
Hi,

I have a cluster of 5 PVE7 nodes with Ceph 16.2.5. The Hardware configuration of 4 of 5 nodes is:
  • CPU: n.2 EPYC ROME 7402
  • RAM: 1 TB ECC
  • 2 x SSD 960 GB ZFS Raid 1 for Proxmox
  • 4 x Micron 9300 MAX 3.2 TB NVMe for Pool 1 named Pool-NVMe
  • 2 x Micron 5300 PRO 3.8 TB SSD for Pool 2 named Pool-SSD
  • NICs: 6 x 100Gb Mellanox Connect-X 5
The configuration of the fifth node is:
  • CPU: n.1 EPYC ROME 7302
  • RAM: 256 GB
  • 2 x SSD 240 GB ZFS Raid 1 for Proxmox
  • 2 x Micron 5300 PRO 3.8 TB SSD for Pool 2 named Pool-SSD
  • NICs: 6 x 100Gb Mellanox Connect-X 5
I have created 2 Pools:
  • Pool-NVMe: composed of NVMe disks (16 x 3.2 TB)
  • Pool-SSD: composed of SSD disks (10 x 3.8 TB)
PVE7, by default, have created both with 32 PGs. Now I have to migrate about 20 VMs on this cluster. Currently this 20 VMs are on a PVE 6.4 server with 6 NVMe disks configured in ZFS RAIDZ-2 and the total space they occupy is about 7 TB.

In the cluster is actived the "Autoscale mode", but I would like to have an optimal number of PGs per pool before to migrate the 20 VMs, so he question is: do you think I have to increase the number of minimum PGs for my 2 Ceph pools? If yes, considering that 4 TB will be stored in Pools-NVME and 3 TB in Pool-SSD, which number of PG do you advice per pool?

Thank you
 
In general, you better have a bit too many PGs than too few. 32 is definitely too few if you are going to fill the pool to that extent.
The autoscaler does only trigger when the PG count is wrong by a factor of 3 or more, so there should not be relied on when the pool is not actively used yet.

That documentation linked above seems to suggest 1024 PGs for your NVMe pool and 512 or 1024 PGs for your SSD pool.
 
See my previous post regarding the "autoscaler" here: https://forum.proxmox.com/threads/c...-before-enabling-auto-scale.80105/post-354624

I strongly recommend disabling the autoscaler until you have 50+ OSDs in your cluster

Code:
ceph config set global osd_pool_default_pg_autoscale_mode {warn|off}
-or-
ceph osd pool set <pool> pg_autoscale_mode {off|warn}

and manually selecting pg_num && pgp_num per above recommendations.

EDIT: add command for existing pools
 
Last edited:
Which driver do you use for the Mellanox Connect-X 5 Cards to get them running under debian bullseye (11). I try to update from 6.4.x to 7.x but the Connect-X 6 cards won't work.

Did you run the cards in eth mode?
 
Which driver do you use for the Mellanox Connect-X 5 Cards to get them running under debian bullseye (11). I try to update from 6.4.x to 7.x but the Connect-X 6 cards won't work.

Did you run the cards in eth mode?
Hi,

I'm using the driver of PVE7, I only upgraded the firmware that I found on mellanox site. Then I downloaded the mellanox tools at the following link:
https://www.mellanox.com/products/adapter-software/firmware-tools

you have also to download the firmware for your card...

Follow this mini how do:
Install pve-headers gcc make dkms with: apt install pve-headers gcc make dkms Extract mellanox tools with: tar zxvf fileNameHere-deb.tgz Install mellanox tools with: ./install.sh Start mellanox tools with: mst start Show device name with: mst status It should retur something like /dev/mst/XXXXXXXX Update the firmware (previously downloaded) with: flint -d /dev/mst/XXXXXXXX -i firmware_name.bin burn

Here some usefull commands:

Code:
Show info about cards:
   mlxfwmanager

Show detailed info:
   mlxconfig -d /dev/mst/XXXXXXXX query

Modify cards from IB to ETH (cards with 2 ports):
   mlxconfig -d /dev/mst/XXXXXXXX set LINK_TYPE_P1=2  (for port 1)

   mlxconfig -d /dev/mst/XXXXXXXX set LINK_TYPE_P2=2   (for port 2)

Regards
 
See my previous post regarding the "autoscaler" here: https://forum.proxmox.com/threads/c...-before-enabling-auto-scale.80105/post-354624

I strongly recommend disabling the autoscaler until you have 50+ OSDs in your cluster

Code:
ceph config set global osd_pool_default_pg_autoscale_mode {warn|off}
-or-
ceph osd pool set <pool> pg_autoscale_mode {off|warn}

and manually selecting pg_num && pgp_num per above recommendations.

EDIT: add command for existing pools
So for my cluster you advice to run the following commands:

Code:
ceph config set global osd_pool_default_pg_autoscale_mode off

But how can I set pg_num and pgp_num at 1024 ? Is it safe to do it in production environment?

Can I use this guide:
  1. https://forum.proxmox.com/threads/pve-ceph-increasing-pg-count-on-a-production-cluster.74145/
and change the pg_num and pgp_num in small increment of 128 until to reach the number of 1024?

Thank you
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!