Ceph Cluster - Slow performance

stex

Member
Aug 8, 2023
1
0
6
Hi,
i've built a 3node proxmox cluster and I'm having many issues with storage performance.
there are 3 Dell R740 with 2x6252 Xeon Gold - 512Gb Ram - 2xm2boot device + 5x6.4Tb NVME disk (original dell disks) - networking SFP+ and storage network 25gbps
storage network is in LACP mode with 2 vlan: 1 for public and 1 for cluster - Jumboframe enabled same in storage switch mtu9000

the problem is that ceph storage is running very slow while bench on disks is working as expected. If i move VM on bootdevice performance are much higher.

any help would be appreciated
 
If this 3-node cluster is never going to be expanded, create a full-mesh broadcast network per https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Example and https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Broadcast_Setup

This setup removes the switch and put the Ceph public, private, and Corosync traffic on this network. Make sure to change the datacenter migration setting to this network per https://forum.proxmox.com/threads/how-to-change-migration-network.157108 and set it to insecure.

I don't bother changing the MTU so I leave it at the default of 1500. Is this considered best practice? No. Does it work? Yes

Not hurting for IOPS.

I use the following optimizations learned through trial-and-error. YMMV.

Code:
    Set SAS HDD Write Cache Enable (WCE) (sdparm -s WCE=1 -S /dev/sd[x])
    Set VM Disk Cache to None if clustered, Writeback if standalone
    Set VM Disk controller to VirtIO-Single SCSI controller and enable IO Thread & Discard option
    Set VM CPU Type to 'Host' for Linux and 'x86-64-v2-AES' on older CPUs/'x86-64-v3' on newer CPUs for Windows
    Set VM CPU NUMA
    Set VM Networking VirtIO Multiqueue to 1
    Set VM Qemu-Guest-Agent software installed and VirtIO drivers on Windows
    Set VM IO Scheduler to none/noop on Linux
    Set Ceph RBD pool to use 'krbd' option
 
Last edited: