Having issues trying to figure out where the issues lies with performance here.
I currently have 5 nodes, each node containing 5 SSD samsung evo disks (I know consumer drives are not the best, but I still wouldn't expect performance to be this low)
the ceph public and cluster network are using Mellanox Connect-X3 with latest firmware into a Mellanox SX1012 at 40Gb over Ethernet
Each node is the following:
24 x Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2 Sockets)
128 GB RAM
Proxmox v7.0-10
Using rados it appears that writes just.. stop? I'm getting current MB/s to be 0 a lot of the time. I created a CT on a ceph pool, and testing disk using DD there, not much better.
I currently have 5 nodes, each node containing 5 SSD samsung evo disks (I know consumer drives are not the best, but I still wouldn't expect performance to be this low)
the ceph public and cluster network are using Mellanox Connect-X3 with latest firmware into a Mellanox SX1012 at 40Gb over Ethernet
Each node is the following:
24 x Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2 Sockets)
128 GB RAM
Proxmox v7.0-10
Using rados it appears that writes just.. stop? I'm getting current MB/s to be 0 a lot of the time. I created a CT on a ceph pool, and testing disk using DD there, not much better.
Code:
root@testct:~# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 14.3277 s, 74.9 MB/s
Code:
:~# iperf -c 10.3.32.185 -P 2
------------------------------------------------------------
Client connecting to 10.3.32.185, TCP port 5001
TCP window size: 165 KByte (default)
------------------------------------------------------------
[ 3] local 10.3.32.186 port 35052 connected with 10.3.32.185 port 5001
[ 4] local 10.3.32.186 port 35054 connected with 10.3.32.185 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0002 sec 16.2 GBytes 13.9 Gbits/sec
[ 4] 0.0000-10.0001 sec 17.9 GBytes 15.4 Gbits/sec
[SUM] 0.0000-10.0001 sec 34.1 GBytes 29.3 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) = 0.126/0.190/0.253/0.126 ms (tot/err) = 2/0
Code:
:~# rados bench -p test 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_0051_23356
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 71 55 219.947 220 0.0346255 0.0483225
2 16 82 66 131.971 44 0.0364492 0.0927038
3 16 82 66 87.9818 0 - 0.0927038
4 16 82 66 65.9868 0 - 0.0927038
5 16 82 66 52.7898 0 - 0.0927038
6 16 82 66 43.9915 0 - 0.0927038
7 16 82 66 37.7072 0 - 0.0927038
8 16 82 66 32.9938 0 - 0.0927038
9 16 82 66 29.3279 0 - 0.0927038
10 16 82 66 26.3951 0 - 0.0927038
11 16 82 66 23.9956 0 - 0.0927038
12 16 82 66 21.9959 0 - 0.0927038
Total time run: 12.6548
Total writes made: 82
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 25.9189
Stddev Bandwidth: 63.6239
Max bandwidth (MB/sec): 220
Min bandwidth (MB/sec): 0
Average IOPS: 6
Stddev IOPS: 15.906
Max IOPS: 55
Min IOPS: 0
Average Latency(s): 2.46885
Stddev Latency(s): 4.86423
Max latency(s): 12.6533
Min latency(s): 0.0248786
Code:
:~# cat /etc/ceph/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.3.32.0/24
fsid = 6e26c2db-7adf-401b-a944-26610d0c77e7
mon_allow_pool_delete = true
mon_host = 10.3.32.185 10.3.32.186 10.3.32.187 10.3.32.188 10.3.32.189
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.3.32.0/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.phy-hv-sl-0049]
host = 0049-phy-hv-sl
mds standby for name = pve
[mds.phy-hv-sl-0050]
host = 0050-phy-hv-sl
mds_standby_for_name = pve
[mds.phy-hv-sl-0051]
host = 0051-phy-hv-sl
mds_standby_for_name = pve
[mon.0047-phy-hv-sl]
public_addr = 10.3.32.185
[mon.0048-phy-hv-sl]
public_addr = 10.3.32.186
[mon.0049-phy-hv-sl]
public_addr = 10.3.32.187
[mon.0050-phy-hv-sl]
public_addr = 10.3.32.188
[mon.0051-phy-hv-sl]
public_addr = 10.3.32.189