Ceph OSD Performance is Slow ?

hanturaya

Member
Jun 17, 2022
14
0
6
Hi guys,
I'm currently testing ceph in proxmox. I've followed the documentation and configured the ceph

I have 3 identical nodes and configured as follows:
CPU: 16 x Intel Xeon Bronze @ 1.90GHz (2 Sockets)
RAM: 32 GB DDR4 2133Mhz
Boot/Proxmox Disk: Patriot Burst SSD 240GB
Disk: 3x HGST 10TB HDD SAS
NIC1: 1 GbE used for Corosync
NIC2: 2x10GbE bonded with LACP for Ceph Traffic

Before that i test my disk one by one using FIO with this command
fio --ioengine=libaio --filename=/dev/sdx --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=fio

This is the result (the result is similar to each disk on server)
For 4K Block Size
1660012918624.png
For 4M Block Size
1660013632603.png


After that i set up the ceph and set OSD with 1 disk on each server, but the speed is decreasing
1660013131403.png

rados -p test bench 30 write
Code:
Total time run:         30.3825
Total writes made:      939
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     123.624
Stddev Bandwidth:       11.7765
Max bandwidth (MB/sec): 148
Min bandwidth (MB/sec): 100
Average IOPS:           30
Stddev IOPS:            2.94412
Max IOPS:               37
Min IOPS:               25
Average Latency(s):     0.514558
Stddev Latency(s):      0.255017
Max latency(s):         1.72565
Min latency(s):         0.124276
rados -p test bench 30 write -b 4K -t 1
Code:
Total time run:         30.0195
Total writes made:      3146
Write size:             4096
Object size:            4096
Bandwidth (MB/sec):     0.40937
Stddev Bandwidth:       0.0648556
Max bandwidth (MB/sec): 0.53125
Min bandwidth (MB/sec): 0.238281
Average IOPS:           104
Stddev IOPS:            16.603
Max IOPS:               136
Min IOPS:               61
Average Latency(s):     0.0095316
Stddev Latency(s):      0.00568832
Max latency(s):         0.047987
Min latency(s):         0.00263956


My question is, Is it really the best OSD speed that i can get with my current configuration ?
 
Last edited:
I forgot to say that this topology connected to Cisco Nexus 3064 with this configuration

for each port:
Code:
interface Ethernet1/1
  description str1-enp129s0f0
  lacp rate fast
  switchport access vlan 459
  channel-group 1 mode active
interface Ethernet1/2
  description str1-enp129s0f1
  lacp rate fast
  switchport access vlan 459
  channel-group 1 mode active
interface Ethernet1/3
  description str2-enp129s0f0
  lacp rate fast
  switchport access vlan 459
  channel-group 2 mode active
interface Ethernet1/4
  description str2-enp129s0f1
  lacp rate fast
  switchport access vlan 459
  channel-group 2 mode active
interface Ethernet1/5
  description str3-enp129s0f0
  lacp rate fast
  switchport access vlan 459
  channel-group 3 mode active
interface Ethernet1/6
  description str3-enp129s0f1
  lacp rate fast
  switchport access vlan 459
  channel-group 3 mode active

for each bonding
Code:
interface port-channel1
  description bonding-str1
  switchport access vlan 459
  spanning-tree bpduguard enable
  spanning-tree bpdufilter enable
  no negotiate auto
interface port-channel2
  description bonding-str2
  switchport access vlan 459
  spanning-tree bpduguard enable
  spanning-tree bpdufilter enable
  no negotiate auto

interface port-channel3
  description bonding-str3
  switchport access vlan 459
  spanning-tree bpduguard enable
  spanning-tree bpdufilter enable
  no negotiate auto

The MTU is set on 9000
 
Last edited:
Hello spirit thank u for your answer,
i've tried my best to avoid cache being used in my test. I run these command before I did the test and made some configuration on my raid controller (I'm using Dell R740xd with H730p raid controller)

Drop Cache
sync; echo 3 > /proc/sys/vm/drop_caches

Disable Cache for Non-Raid (I'm using HBA)
do you use some cache with a raid controller ?
1660034268417.png

Disable Write Cache
1660034463926.png

But the test still showing that I get around 3500 IOps.

"because you shouldn't have more than 150-200 iops for 1 hdd disk"
So the rados test was right that my disk only capable to reach that performance ?

Thank you
 
Last edited:
In my opinion this is expected low performance
ceph quantity is needed to improve performance

you cluster only 3 node , it's default minimum requirements
By the way, not recommended any bonded with ceph traffic, can use one for Public Network another for Cluster Network



I'm manage a cluster

4node
CPU: 2x E5-2690 v2
RAM: 192GB
cache tier: 1x Samsung PM1725b 1.6TB
hdd tier: 10x Toshiba MG06 8TB
 
Last edited:
hello kenneth thank u for the answer,

can use one for Public Network another for Cluster Network
is separating cluster network and public network will improve the ceph OSD performance ?
you cluster only 3 node , it's default minimum requirements
actually right now i have 6 servers with identical hardware, this is the test with 6 server running 1 OSD each server
1660037604592.png
This is the result i get
rados -p test bench 30 write -b 4K -t 1
Code:
Total time run:         30.0309
Total writes made:      2412
Write size:             4096
Object size:            4096
Bandwidth (MB/sec):     0.31374
Stddev Bandwidth:       0.0318584
Max bandwidth (MB/sec): 0.398438
Min bandwidth (MB/sec): 0.261719
Average IOPS:           80
Stddev IOPS:            8.15574
Max IOPS:               102
Min IOPS:               67
Average Latency(s):     0.0124404
Stddev Latency(s):      0.00842863
Max latency(s):         0.0689223
Min latency(s):         0.00282277

rados -p test bench 30 write
Code:
Total time run:         30.4837
Total writes made:      1427
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     187.247
Stddev Bandwidth:       17.7798
Max bandwidth (MB/sec): 212
Min bandwidth (MB/sec): 148
Average IOPS:           46
Stddev IOPS:            4.44494
Max IOPS:               53
Min IOPS:               37
Average Latency(s):     0.338962
Stddev Latency(s):      0.245017
Max latency(s):         1.54223
Min latency(s):         0.050242

In my opinion this is expected low performance
What do you think the problem here ?

Thank you
 
  • In SUSE Enterprise Storage 7 documents, There are 2 different views

https://documentation.suse.com/ses/7/html/ses-all/storage-bp-hwreq.html#storage-bp-net-private
Code:
If you do not specify a cluster network during Ceph deployment, it assumes a single public network environment. While Ceph operates fine with a public network, its performance and security improves when you set a second private cluster network. To support two networks, each Ceph node needs to have at least two network cards.

https://documentation.suse.com/ses/7/html/ses-all/storage-bp-hwreq.html#ses-bp-minimum-cluster
Code:
A minimal product cluster configuration consists of:
At least four physical nodes (OSD nodes) with co-location of services
Dual-10 Gb Ethernet as a bonded network
A separate Admin Node (can be virtualized on an external node)



  • In PVE documents

https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster
Code:
Public Network: You can set up a dedicated network for Ceph. This setting is required. Separating your Ceph traffic is highly recommended. Otherwise, it could cause trouble with other latency dependent services, for example, cluster communication may decrease Ceph’s performance.

Cluster Network: As an optional step, you can go even further and separate the OSD replication & heartbeat traffic as well. This will relieve the public network and could lead to significant performance improvements, especially in large clusters.



consider you have only one 2 ports 10GbE card , you can test 2 different configurations performance
“In my opinion this is expected low performance” Various references based on different documents , need more OSDs in SUSE7 minimum-cluster recommend 4 nodes each have 8 OSDs

In simple clusters or small production clusters 4k I/O performance have low performance is normal ,it's need powerful CPU, switch, OptaneSSD for WAL/RocksDB, and a lot of client to push stress test

English is not my first language, grammar may be wrong
 
In my opinion this is expected low performance
ceph quantity is needed to improve performance

you cluster only 3 node , it's default minimum requirements
By the way, not recommended any bonded with ceph traffic, can use one for Public Network another for Cluster Network



I'm manage a cluster

4node
CPU: 2x E5-2690 v2
RAM: 192GB
cache tier: 1x Samsung PM1725b 1.6TB
hdd tier: 10x Toshiba MG06 8TB
hello kenneth, sorry for the late question, just as for my reference could u show me your cluster performance ?. Thank you
 
I have very similar fio results to you and I would like to understand if there is anything I can change to improve performance. I am running a 3 not cluster, very similar configuration.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!