Proxmox with ceph performance

Vivien

New Member
Apr 24, 2026
6
0
1
Hi,

I just connect my proxmox to a new ceph nvme cluster (not hyperconverged). We did some benchmark but the problem is that our read and write are lock for a single VM, the problem didn't seems to come from the ceph cluster as we can do multiples benchmark with VM in proxmox and they are all stuck at the same amount of IOPS. The ceph cluster monitoring did shows the addition of our VM IOPS.

Is there any configuration to do in proxmox for nvme based ceph cluster ? Or am I wrong and the ceph cluster is the bottleneck ?

Thanks for any help ! :)

Vivien
 
Some more details would be good to know:

* Disk model of the OSDs
* Network speed for the physical Ceph network(s)
* General specs of the servers, like CPU and RAM
* cat /etc/pve/ceph.conf and cat /etc/network/interfaces please paste the output inside [CODE] blocks (or use the code formatting buttons at the top of the editor.
 
Hi,
* Disk model of the OSDs
They are Dell enterprise NVME 15.36 T, we got 2 types

Dell Ent NVMe CM7 E3.S RI 15.36TB and Dell NVMe ISE PS1010 RI E3.S 15.36TB

* Network speed for the physical Ceph network(s)

Ceph cluster is linked in 2*100G (2 * 100G for internal and 2 * 100G for public network)
Proxmox Cluster is in 2 * 25 G I believe
* General specs of the servers, like CPU and RAM

For the ceph cluster :

amd EPYC 9555P, 256 Gb RAM, 7 OSD per host with 9 hosts

For the proxmox cluster :

AMD EPYC 9354 756 Gb RAM 3 hosts
* cat /etc/pve/ceph.conf and cat /etc/network/interfaces please paste the output inside blocks (or use the code formatting buttons at the top of the editor.

I'm not using the pve ceph my ceph cluster is under cephadm the only thing I changed yet is the (and our bond are MTU 9000 on both side)

Code:
ceph config get osd osd_mclock_max_capacity_iops_ssd

21500.000000
to

ceph config set osd osd_mclock_max_capacity_iops_ssd 80000


(21500 is the limit of IOPS we obtain in write per VM but it didn't change anything)

We did a fio bench on one NVMe which give us around 160k IOPS in write
 
The HW looks good so far.

If I understand it correctly, the PVE hosts connect to the Ceph cluster via 25Gbit/s? While the Ceph nodes themselves use 100Gbit/s?

I would verify that the network performs as expected, as in, do iperf / iperf3 checks between the Ceph nodes and the Ceph-PVE nodes. In both directions!

Disable any power saving / C-state features on the servers. Going to sleep and waking up CPU cores can also introduce latency.

And make sure, if this is a new and still empty cluster, that the pool(s) have enough PGs. If you have only one main pool beside the .mgr, set the target_ratio to 1 (or any other value, it is a ratio between all pools with one) so that the autoscaler knows that his pool is expected to consume all the available space.
 
network interface mtu mismatch would decimate percieved performance, but there are other possibilities. while I'm not volunteering to check for you, you might want to

ceph config dump
ceph config show osd.x --show-with-defaults

and go over it with a fine toothed comb.

Last thing- in a pve environment, in guest rbd performance will be heavily impacted by the guest type, cpu and memory assigned, and core pinning (if you can at all help it, make sure the VM is on the same socket as the NIC)
 
  • Like
Reactions: Johannes S
The HW looks good so far.

If I understand it correctly, the PVE hosts connect to the Ceph cluster via 25Gbit/s? While the Ceph nodes themselves use 100Gbit/s?

I would verify that the network performs as expected, as in, do iperf / iperf3 checks between the Ceph nodes and the Ceph-PVE nodes. In both directions!

Disable any power saving / C-state features on the servers. Going to sleep and waking up CPU cores can also introduce latency.

And make sure, if this is a new and still empty cluster, that the pool(s) have enough PGs. If you have only one main pool beside the .mgr, set the target_ratio to 1 (or any other value, it is a ratio between all pools with one) so that the autoscaler knows that his pool is expected to consume all the available space.

Hi,

>If I understand it correctly, the PVE hosts connect to the Ceph cluster via 25Gbit/s? While the Ceph nodes themselves use 100Gbit/s?

Yup

>I would verify that the network performs as expected, as in, do iperf / iperf3 checks between the Ceph nodes and the Ceph-PVE nodes. In both directions!
>Disable any power saving / C-state features on the servers. Going to sleep and waking up CPU cores can also introduce latency.

Iperf bandwith looks good and C-state disable

>And make sure, if this is a new and still empty cluster, that the pool(s) have enough PGs. If you have only one main pool beside the .mgr, set the >target_ratio to 1 (or any other value, it is a ratio between all pools with one) so that the autoscaler knows that his pool is expected to consume all the >available space.

I disabled the autoscaler, a bit afraid that it will up and down pg number in pool and prefer to have a stable number
 
I disabled the autoscaler, a bit afraid that it will up and down pg number in pool and prefer to have a stable number
Set it to warn, then you will see what the ideal would be, without it acting by itself. You should have something in the ballpark of 100PGs/OSD. If you have to few, it can impact performance and also recovery speed/impact in case you lose a node/OSD.
 
network interface mtu mismatch would decimate percieved performance, but there are other possibilities. while I'm not volunteering to check for you, you might want to

ceph config dump
ceph config show osd.x --show-with-defaults

and go over it with a fine toothed comb.

Last thing- in a pve environment, in guest rbd performance will be heavily impacted by the guest type, cpu and memory assigned, and core pinning (if you can at all help it, make sure the VM is on the same socket as the NIC)
>network interface mtu mismatch would decimate percieved performance, but there are other possibilities. while I'm not volunteering to check for you, you >might want to

All is MTU 9000

Code:
mgr                      advanced  mgr/prometheus/rbd_stats_pools         *                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       *
osd                      advanced  bluestore_elastic_shared_blobs         false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   *
osd                      advanced  mon_allow_pool_delete                  false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
osd                      basic     osd_mclock_max_capacity_iops_ssd       90000.000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
osd     host:r6715-26-1  basic     osd_memory_target                      26489444878                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
osd     host:r6715-26-2  basic     osd_memory_target                      26622755927                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
osd     host:r6715-26-3  basic     osd_memory_target                      26489446517                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
osd     host:r6715-26-4  basic     osd_memory_target                      26642839844                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
osd     host:r6715-26-5  basic     osd_memory_target                      26489446517                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
osd     host:r6715-26-6  basic     osd_memory_target                      26642839844                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
osd     host:r6715-26-7  basic     osd_memory_target                      26489444469                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
osd     host:r6715-26-8  basic     osd_memory_target                      26642838206                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
osd     host:r6715-26-9  basic     osd_memory_target                      26489444059                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
osd                      basic     osd_memory_target                      8000000000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
osd                      advanced  osd_memory_target_autotune             true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
osd.13                   basic     osd_mclock_max_capacity_iops_ssd       74703.001410
 
ceph config show osd.x --show-with-defaults
Code:
ceph config show osd.50                     
NAME                                             VALUE                                                                                                                                                                                                                                                                          SOURCE    OVERRIDES  IGNORES
bluestore_elastic_shared_blobs                   false                                                                                                                                                                                                                                                                          mon                         
cluster_network                                  daemonize                                        false                                                                                                                                                                                                                                                                          override                   
keyring                                          $osd_data/keyring                                                                                                                                                                                                                                                              default                     
log_stderr_prefix                                debug                                                                                                                                                                                                                                                                          default                     
log_to_file                                      false                                                                                                                                                                                                                                                                          default                     
log_to_stderr                                    true                                                                                                                                                                                                                                                                           default                     
mon_allow_pool_delete                            false                                                                                                                                                                                                                                                                          mon                         
mon_host                                       no_config_file                                   false                                                                                                                                                                                                                                                                          override                   
osd_delete_sleep                                 0.000000                                                                                                                                                                                                                                                                       override                   
osd_delete_sleep_hdd                             0.000000                                                                                                                                                                                                                                                                       override                   
osd_delete_sleep_hybrid                          0.000000                                                                                                                                                                                                                                                                       override                   
osd_delete_sleep_ssd                             0.000000                                                                                                                                                                                                                                                                       override                   
osd_max_backfills                                1                                                                                                                                                                                                                                                                              default                     
osd_mclock_max_capacity_iops_ssd                 90000.000000                                                                                                                                                                                                                                                                   mon                         
osd_mclock_scheduler_background_best_effort_lim  0.900000                                                                                                                                                                                                                                                                       default                     
osd_mclock_scheduler_background_best_effort_res  0.000000                                                                                                                                                                                                                                                                       default                     
osd_mclock_scheduler_background_best_effort_wgt  1                                                                                                                                                                                                                                                                              default                     
osd_mclock_scheduler_background_recovery_lim     0.000000                                                                                                                                                                                                                                                                       default                     
osd_mclock_scheduler_background_recovery_res     0.500000                                                                                                                                                                                                                                                                       default                     
osd_mclock_scheduler_background_recovery_wgt     1                                                                                                                                                                                                                                                                              default                     
osd_mclock_scheduler_client_lim                  0.000000                                                                                                                                                                                                                                                                       default                     
osd_mclock_scheduler_client_res                  0.500000                                                                                                                                                                                                                                                                       default                     
osd_mclock_scheduler_client_wgt                  1                                                                                                                                                                                                                                                                              default                     
osd_memory_target                                26642838206                                                                                                                                                                                                                                                                    mon                         
osd_memory_target_autotune                       true                                                                                                                                                                                                                                                                           mon                         
osd_objectstore                                  bluestore                                                                                                                                                                                                                                                                      cmdline                     
osd_recovery_max_active                          0                                                                                                                                                                                                                                                                              default                     
osd_recovery_max_active_hdd                      3                                                                                                                                                                                                                                                                              default                     
osd_recovery_max_active_ssd                      10                                                                                                                                                                                                                                                                             default                     
osd_recovery_sleep                               0.000000                                                                                                                                                                                                                                                                       override                   
osd_recovery_sleep_degraded                      0.000000                                                                                                                                                                                                                                                                       override                   
osd_recovery_sleep_degraded_hdd                  0.000000                                                                                                                                                                                                                                                                       override                   
osd_recovery_sleep_degraded_hybrid               0.000000                                                                                                                                                                                                                                                                       override                   
osd_recovery_sleep_degraded_ssd                  0.000000                                                                                                                                                                                                                                                                       override                   
osd_recovery_sleep_hdd                           0.000000                                                                                                                                                                                                                                                                       override                   
osd_recovery_sleep_hybrid                        0.000000                                                                                                                                                                                                                                                                       override                   
osd_recovery_sleep_ssd                           0.000000                                                                                                                                                                                                                                                                       override                   
osd_scrub_sleep                                  0.000000                                                                                                                                                                                                                                                                       override                   
osd_snap_trim_sleep                              0.000000                                                                                                                                                                                                                                                                       override                   
osd_snap_trim_sleep_hdd                          0.000000                                                                                                                                                                                                                                                                       override                   
osd_snap_trim_sleep_hybrid                       0.000000                                                                                                                                                                                                                                                                       override                   
osd_snap_trim_sleep_ssd                          0.000000                                                                                                                                                                                                                                                                       override                   
rbd_default_features                             61                                                                                                                                                                                                                                                                             default                     
rbd_qos_exclude_ops                              0                                                                                                                                                                                                                                                                              default                     
setgroup                                         ceph                                                                                                                                                                                                                                                                           cmdline                     
setuser                                          ceph
 
>Last thing- in a pve environment, in guest rbd performance will be heavily impacted by the guest type, cpu and memory assigned, and core pinning (if you >can at all help it, make sure the VM is on the same socket as the NIC)

Will take a look at it, we already manage to double IOs by activating KRBD for pool in proxmox UI

>Set it to warn, then you will see what the ideal would be, without it acting by itself. You should have something in the ballpark of 100PGs/OSD. If you have >to few, it can impact performance and also recovery speed/impact in case you lose a node/OSD.

I manage to have around 200 PGs but will try the warn mode then

Thanks you for your fast response