Ceph performance issue

igorkuz

New Member
Nov 16, 2022
1
0
1
Hi all,
I'm new here. I have a 3 node cluster running ceph with 8x 1TB SSD per host but performance that I get out of it is very poor. By testing from a windows guest, using CrystalDiskMark, I get 340 MB/s read and 47MB/s write.

Any help would be appreciated as I really want it to work.

Here is my setup info:
3x Dell PE R620
8X Samsung 850 Pro SSDs connected using PERC H310 Non-RAID per host
2x Intel Xeon CPU E5-2689 per host
196GB RAM per host
Network is 10Gb (without the switch, this is the guide I used) and iperf test is showing that I get full 10Gb

Below is more info:

Code:
rados -p CephVM bench 10 write
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pmh1_86946
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        79        63    251.99       252    0.091274    0.132514
    2      16       101        85   169.981        88   0.0554055    0.140731
    3      16       133       117    155.98       128   0.0486606    0.380463
    4      16       177       161   160.977       176   0.0681243    0.360015
    5      16       220       204   163.176       172    0.772509    0.360257
    6      16       246       230   153.309       104     0.44223    0.346303
    7      16       288       272   155.404       168    0.215485    0.388716
    8      16       331       315   157.474       172   0.0470773    0.373749
    9      16       364       348   154.641       132    0.701008    0.392809
   10      15       387       372   148.775        96   0.0982538     0.38876
   11      13       387       374   135.977         8     0.14039    0.387209
Total time run:         11.4729
Total writes made:      387
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     134.926
Stddev Bandwidth:       63.2961
Max bandwidth (MB/sec): 252
Min bandwidth (MB/sec): 8
Average IOPS:           33
Stddev IOPS:            15.824
Max IOPS:               63
Min IOPS:               2
Average Latency(s):     0.462407
Stddev Latency(s):      0.663262
Max latency(s):         3.14395
Min latency(s):         0.0423993
Cleaning up (deleting benchmark objects)
Removed 387 objects
Clean up completed and total clean up time :0.533057

Config:
Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.15.15.51/24
     fsid = b1326e1a-73a8-4418-b5d0-XXXXXXXXXXXXXXXXXXXXX
     mon_allow_pool_delete = true
     mon_host = 10.15.15.51 10.15.15.52 10.15.15.53
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.15.15.51/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pmh1]
     host = pmh1
     mds_standby_for_name = pve

[mds.pmh2]
     host = pmh2
     mds_standby_for_name = pve

[mds.pmh3]
     host = pmh3
     mds standby for name = pve

[mon.pmh1]
     public_addr = 10.15.15.51

[mon.pmh2]
     public_addr = 10.15.15.52

[mon.pmh3]
     public_addr = 10.15.15.53
 

Attachments

  • Screenshot from 2022-11-16 10-47-40.png
    Screenshot from 2022-11-16 10-47-40.png
    26.6 KB · Views: 17
Last edited:
Can you try setting your target ratio, so the Autoscaler scales up the number of PGs accordingly? 32 seems a bit on the lower end. After the autoscaler has done its work, try benchmarking again to see if it yielded any improvements.
 
I use the following optimizations in a 5-node 12th-gen Dell cluster using SAS drives:

Set write cache enable (WCE) to 1 on SAS drives
Set VM cache to none
Set VM to use VirtIO-single SCSI controller and enable IO thread and discard option
Set VM CPU type to 'host'
Set VM CPU NUMA if server has 2 or more physical CPU sockets
Set VM VirtIO Multiqueue to number of cores/vCPUs
Set VM to have qemu-guest-agent software installed
Set Linux VMs IO scheduler to none/noop
Set RBD pool to use the 'krbd' option

I get write IOPS in the hundreds and reads are usually double/triple write IOPS.

Since you aren't using a switch, I recommend a full-mesh broadcast network https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Broadcast_Setup