Proxmox with ceph performance

Vivien

New Member
Apr 24, 2026
2
0
1
Hi,

I just connect my proxmox to a new ceph nvme cluster (not hyperconverged). We did some benchmark but the problem is that our read and write are lock for a single VM, the problem didn't seems to come from the ceph cluster as we can do multiples benchmark with VM in proxmox and they are all stuck at the same amount of IOPS. The ceph cluster monitoring did shows the addition of our VM IOPS.

Is there any configuration to do in proxmox for nvme based ceph cluster ? Or am I wrong and the ceph cluster is the bottleneck ?

Thanks for any help ! :)

Vivien
 
Some more details would be good to know:

* Disk model of the OSDs
* Network speed for the physical Ceph network(s)
* General specs of the servers, like CPU and RAM
* cat /etc/pve/ceph.conf and cat /etc/network/interfaces please paste the output inside [CODE] blocks (or use the code formatting buttons at the top of the editor.
 
Hi,
* Disk model of the OSDs
They are Dell enterprise NVME 15.36 T, we got 2 types

Dell Ent NVMe CM7 E3.S RI 15.36TB and Dell NVMe ISE PS1010 RI E3.S 15.36TB

* Network speed for the physical Ceph network(s)

Ceph cluster is linked in 2*100G (2 * 100G for internal and 2 * 100G for public network)
Proxmox Cluster is in 2 * 25 G I believe
* General specs of the servers, like CPU and RAM

For the ceph cluster :

amd EPYC 9555P, 256 Gb RAM, 7 OSD per host with 9 hosts

For the proxmox cluster :

AMD EPYC 9354 756 Gb RAM 3 hosts
* cat /etc/pve/ceph.conf and cat /etc/network/interfaces please paste the output inside blocks (or use the code formatting buttons at the top of the editor.

I'm not using the pve ceph my ceph cluster is under cephadm the only thing I changed yet is the (and our bond are MTU 9000 on both side)

Code:
ceph config get osd osd_mclock_max_capacity_iops_ssd

21500.000000
to

ceph config set osd osd_mclock_max_capacity_iops_ssd 80000


(21500 is the limit of IOPS we obtain in write per VM but it didn't change anything)

We did a fio bench on one NVMe which give us around 160k IOPS in write
 
The HW looks good so far.

If I understand it correctly, the PVE hosts connect to the Ceph cluster via 25Gbit/s? While the Ceph nodes themselves use 100Gbit/s?

I would verify that the network performs as expected, as in, do iperf / iperf3 checks between the Ceph nodes and the Ceph-PVE nodes. In both directions!

Disable any power saving / C-state features on the servers. Going to sleep and waking up CPU cores can also introduce latency.

And make sure, if this is a new and still empty cluster, that the pool(s) have enough PGs. If you have only one main pool beside the .mgr, set the target_ratio to 1 (or any other value, it is a ratio between all pools with one) so that the autoscaler knows that his pool is expected to consume all the available space.
 
network interface mtu mismatch would decimate percieved performance, but there are other possibilities. while I'm not volunteering to check for you, you might want to

ceph config dump
ceph config show osd.x --show-with-defaults

and go over it with a fine toothed comb.

Last thing- in a pve environment, in guest rbd performance will be heavily impacted by the guest type, cpu and memory assigned, and core pinning (if you can at all help it, make sure the VM is on the same socket as the NIC)
 
  • Like
Reactions: Johannes S