Ceph Pool PGs Reduced to 32 by Autoscale Despite Setting 256 – Is This Normal?

devaux · 2025-10-10T09:41:18+0200

Hi everyone,

I’m configuring a new Ceph cluster managed by Proxmox VE 9.0 (ceph version 19.2.3) with the following setup:
- Nodes: 3 hosts, each with 5x ThinkSystem 3.5" U.3 7500 PRO 1.92TB NVMe drives (15 OSDs total, ~28.8 TB raw capacity).
- Network: ~64 Gbit/s, MTU 9000 (public and cluster network).
- Pools: Two pools:
- `.mgr` (1 PG, for Ceph management).
- `vms0_ceph` (RBD pool for VMs, replication size 3, min_size 2).
- Data : ~373 GiB (~1.1 TiB with replication).

- Benchmarks:
- Read IOPS: ~63,000 (4K random read, 16 threads, 10G total).
- Write IOPS: ~75,000 (4K random write, 16 threads, 10G total).
- Ceph 19.2.3).
- Proxmox 9.03

Issue:
When creating the `vms0_ceph` pool, I set the PG number to 256, as I calculated the optimal PG count to be ~500 (`(15 OSDs × 100) ÷ 3 ≈ 500`) and chose 256 as a compromise. However, the Proxmox GUI and `ceph osd pool ls detail` show 32 (`pg_num 32`, `pgp_num 32`, `autoscale_mode on`), and the latest `ceph -s` output (as of 09:25 AM CEST, Oct 10, 2025) shows **33 PGs total** (1 for `.mgr`, 32 for `vms0_ceph`), all in `active+clean` state.

It seems the autoscale mode is overriding my manual setting of 256 PGs and reducing it to 32.

Performance Concerns:
With only 32 PGs, I suspect the cluster is not fully utilizing the NVMe drives’ parallelism, as evidenced by the lower read IOPS (63,000) compared to write IOPS (75,000). I believe increasing the PG count to 256 or 512 would improve performance, especially for read-heavy VM workloads.Questions:

Is it normal for Proxmox’s autoscale mode to override my manual PG setting (256) and reduce it to 32?
Is 32 PGs too low for a cluster with 15 NVMe OSDs (ThinkSystem 3.5" U.3 7500 PRO 1.92TB) and replication factor 3?
Have I made a mistake in my configuration, or is this expected behavior?
Should I disable autoscale (ceph osd pool set vms0_ceph autoscale_mode off) and set the PG count to 256 or 512? What are the risks or considerations?
How can I ensure the PG count stays at my desired value (256 or 512)?
Could the lower read IOPS (~63,000 vs. ~75,000 write IOPS) be related to the low PG count, or are there other factors I should investigate?

Bash:

~# rbd bench --io-type write --io-size 4K --io-threads 16 --io-total 1G vms0_ceph/test-image
^[[3~bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern sequential
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     78816   78910.1   308 MiB/s
    2    136816   68449.5   267 MiB/s
    3    187632   62569.5   244 MiB/s
    4    250048     62531   244 MiB/s
elapsed: 4   ops: 262144   ops/sec: 61957.3   bytes/sec: 242 MiB/s
root@vm1001:~# ceph version
ceph version 19.2.3 (2f03f1cd83e5d40cdf1393cb64a662a8e8bb07c6) squid (stable)
root@vm1001:~# rbd bench --io-type read --io-size 4K --io-threads 16 --io-total 1G vms0_ceph/test-image
bench  type read io_size 4096 io_threads 16 bytes 1073741824 pattern sequential
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     71344   71430.7   279 MiB/s
    2    150704   75396.9   295 MiB/s
    3    226608   75565.7   295 MiB/s
elapsed: 3   ops: 262144   ops/sec: 75371.3   bytes/sec: 294 MiB/s

Bash:

~# pveceph pool ls --noborder
Name      Size Min Size PG Num min. PG Num Optimal PG Num PG Autoscale Mode PG Autoscale Target Size PG Autoscale Target Ratio Crush Rule Name               %
.mgr         3        2      1           1              1 on                                                                   replicated_rule 2.8115297823205
vms0_ceph    3        2     32                         32 on                                                                   replicated_rule   0.04410051554

root@vm1001:~# pveceph pool ls --noborder
ceph osd df tree
Name      Size Min Size PG Num min. PG Num Optimal PG Num PG Autoscale Mode PG Autoscale Target Size PG Autoscale Target Ratio Crush Rule Name               %
.mgr         3        2      1           1              1 on                                                                   replicated_rule 2.8115297823205
vms0_ceph    3        2     32                         32 on                                                                   replicated_rule   0.04410051554
ID  CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL    %USE  VAR   PGS  STATUS  TYPE NAME        
-1         26.19896         -   26 TiB  1.1 TiB  1.1 TiB  38 KiB  9.7 GiB   25 TiB  4.10  1.00    -          root default      
-3          8.73299         -  8.7 TiB  368 GiB  363 GiB   9 KiB  4.8 GiB  8.4 TiB  4.11  1.00    -              host vm1001
 0    ssd   1.74660   1.00000  1.7 TiB   58 GiB   57 GiB   1 KiB  1.1 GiB  1.7 TiB  3.25  0.79    5      up          osd.0    
 1    ssd   1.74660   1.00000  1.7 TiB   92 GiB   91 GiB   5 KiB  624 MiB  1.7 TiB  5.12  1.25    8      up          osd.1    
 2    ssd   1.74660   1.00000  1.7 TiB   69 GiB   68 GiB   1 KiB  1.0 GiB  1.7 TiB  3.88  0.95    6      up          osd.2    
 3    ssd   1.74660   1.00000  1.7 TiB   80 GiB   79 GiB   1 KiB  1.1 GiB  1.7 TiB  4.49  1.10    8      up          osd.3    
 4    ssd   1.74660   1.00000  1.7 TiB   69 GiB   68 GiB   1 KiB  952 MiB  1.7 TiB  3.83  0.94    6      up          osd.4    
-5          8.73299         -  8.7 TiB  367 GiB  363 GiB   8 KiB  4.2 GiB  8.4 TiB  4.11  1.00    -              host vm1002
 5    ssd   1.74660   1.00000  1.7 TiB   58 GiB   57 GiB   1 KiB  949 MiB  1.7 TiB  3.23  0.79    5      up          osd.5    
 6    ssd   1.74660   1.00000  1.7 TiB   80 GiB   79 GiB   1 KiB  1.0 GiB  1.7 TiB  4.48  1.09    7      up          osd.6    
 7    ssd   1.74660   1.00000  1.7 TiB   57 GiB   56 GiB   1 KiB  942 MiB  1.7 TiB  3.21  0.78    5      up          osd.7    
 8    ssd   1.74660   1.00000  1.7 TiB  115 GiB  114 GiB   4 KiB  317 MiB  1.6 TiB  6.40  1.56   11      up          osd.8    
 9    ssd   1.74660   1.00000  1.7 TiB   58 GiB   57 GiB   1 KiB  985 MiB  1.7 TiB  3.22  0.79    5      up          osd.9    
-7          8.73299         -  8.7 TiB  364 GiB  363 GiB  21 KiB  754 MiB  8.4 TiB  4.07  0.99    -              host vm1003
10    ssd   1.74660   1.00000  1.7 TiB   80 GiB   80 GiB   5 KiB  172 MiB  1.7 TiB  4.48  1.09    7      up          osd.10    
11    ssd   1.74660   1.00000  1.7 TiB   34 GiB   34 GiB   4 KiB   62 MiB  1.7 TiB  1.88  0.46    3      up          osd.11    
12    ssd   1.74660   1.00000  1.7 TiB   80 GiB   80 GiB   4 KiB  136 MiB  1.7 TiB  4.47  1.09    8      up          osd.12    
13    ssd   1.74660   1.00000  1.7 TiB  125 GiB  125 GiB   4 KiB  282 MiB  1.6 TiB  7.00  1.71   11      up          osd.13    
14    ssd   1.74660   1.00000  1.7 TiB   45 GiB   45 GiB   4 KiB  102 MiB  1.7 TiB  2.52  0.62    4      up          osd.14    
                        TOTAL   26 TiB  1.1 TiB  1.1 TiB  46 KiB  9.7 GiB   25 TiB  4.10                                      
MIN/MAX VAR: 0.46/1.71  STDDEV: 1.31

Bash:

~# fastfetch
         .://:`              `://:.             root@vm1001
       `hMMMMMMd/          /dMMMMMMh`           ---------------
        `sMMMMMMMd:      :mMMMMMMMs`            OS: Proxmox VE 9.0.10 x86_64
`-/+oo+/:`.yMMMMMMMh-  -hMMMMMMMy.`:/+oo+/-`    Host: ThinkSystem SR650 V3 (07)
`:oooooooo/`-hMMMMMMMyyMMMMMMMh-`/oooooooo:`    Kernel: Linux 6.14.11-3-pve
  `/oooooooo:`:mMMMMMMMMMMMMm:`:oooooooo/`      Uptime: 2 days, 1 hour, 33 mins
    ./ooooooo+- +NMMMMMMMMN+ -+ooooooo/.        Packages: 773 (dpkg)
      .+ooooooo+-`oNMMMMNo`-+ooooooo+.          Shell: bash 5.2.37
        -+ooooooo/.`sMMs`./ooooooo+-            Display (Acer B223W): 1024x768 @ 75 Hz in 22"
          :oooooooo/`..`/oooooooo:              Terminal: /dev/pts/2
          :oooooooo/`..`/oooooooo:              CPU: 2 x Intel(R) Xeon(R) Silver 4410T (40) @ 4.00 GHz
        -+ooooooo/.`sMMs`./ooooooo+-            GPU: ASPEED Technology, Inc. ASPEED Graphics Family
      .+ooooooo+-`oNMMMMNo`-+ooooooo+.          Memory: 11.01 GiB / 125.61 GiB (9%)
    ./ooooooo+- +NMMMMMMMMN+ -+ooooooo/.        Swap: 0 B / 8.00 GiB (0%)
  `/oooooooo:`:mMMMMMMMMMMMMm:`:oooooooo/`      Disk (/): 4.39 GiB / 93.93 GiB (5%) - ext4
`:oooooooo/`-hMMMMMMMyyMMMMMMMh-`/oooooooo:`    Local IP (vmbr10): 10.91.10.11/24
`-/+oo+/:`.yMMMMMMMh-  -hMMMMMMMy.`:/+oo+/-`    Locale: C
        `sMMMMMMMm:      :dMMMMMMMs`
       `hMMMMMMd/          /dMMMMMMh`                                
         `://:`              `://:`

readyspace · 2025-10-10T10:04:35+0200

Yes—this is expected with the PG autoscaler. With a mostly empty pool, Ceph starts low (often 32 PGs) and grows PGs as data increases unless you guide it with bulk, target_size_ratio/bytes, or a pg_num_min.

SteveITS · 2025-10-10T18:25:48+0200

Note too Ceph seems to honor a target ratio but not a target size, at least in my experience.

Search

Search

Ceph Pool PGs Reduced to 32 by Autoscale Despite Setting 256 – Is This Normal?

devaux

Active Member

readyspace

Well-Known Member

SteveITS

Active Member

We value your privacy