Hi, I am building my a Ceph cluster with Proxmox 6.1, and I am experiencing a low performance. Hope you can help me identify where is my bottleneck.
At this moment I am using 3 nodes, with 5 OSDs on each node (all SSD).
Specs per node:
Supermicro Fatwin SYS-F618R2-RT+
128 Gb DDR4
1x E5-1630v4
5x Intel S3510 800Gb for Ceph, connected to SATA ports on motherboard
1x SSD 80Gb for Proxmox
2x Gigabit NIC (only one used for WAN)
1x Mellanox MT27500 [ConnectX-3] Infiniband QDR 40g (for Ceph) - mtu 65520
No journal
Switch ceph network: Voltaire 4036
# ceph osd pool create scbench 100 100
# rados bench -p scbench 10 write --no-cleanup
# rados bench -p scbench 10 seq
# rados bench -p scbench 10 rand
# pveversion -v
# cat /etc/pve/ceph.conf
# cat /etc/network/interfaces
Switch Voltaire 4036:
4036-46EC# module-firmware show
Infiniband card:
# ibstat
# iperf -c 10.12.12.13
Any idea? Any help would be appreciated. Thank you in advance.
At this moment I am using 3 nodes, with 5 OSDs on each node (all SSD).
Specs per node:
Supermicro Fatwin SYS-F618R2-RT+
128 Gb DDR4
1x E5-1630v4
5x Intel S3510 800Gb for Ceph, connected to SATA ports on motherboard
1x SSD 80Gb for Proxmox
2x Gigabit NIC (only one used for WAN)
1x Mellanox MT27500 [ConnectX-3] Infiniband QDR 40g (for Ceph) - mtu 65520
No journal
Switch ceph network: Voltaire 4036
# ceph osd pool create scbench 100 100
# rados bench -p scbench 10 write --no-cleanup
Code:
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_ceph-test3_1776125
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 54 38 151.997 152 0.0386031 0.264344
2 16 99 83 165.99 180 0.0197462 0.317243
3 16 142 126 167.989 172 0.0383428 0.335559
4 16 186 170 169.987 176 1.44781 0.341263
5 16 244 228 182.386 232 0.269771 0.321895
6 16 281 265 176.653 148 1.2315 0.339699
7 16 318 302 172.557 148 0.314595 0.34656
8 16 353 337 168.486 140 0.0184961 0.357753
9 16 392 376 167.097 156 0.622394 0.359017
10 16 435 419 167.585 172 0.365768 0.358431
Total time run: 10.504
Total writes made: 436
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 166.032
Stddev Bandwidth: 26.4961
Max bandwidth (MB/sec): 232
Min bandwidth (MB/sec): 140
Average IOPS: 41
Stddev IOPS: 6.62403
Max IOPS: 58
Min IOPS: 35
Average Latency(s): 0.38319
Stddev Latency(s): 0.438058
Max latency(s): 2.14432
Min latency(s): 0.0179365
# rados bench -p scbench 10 seq
Code:
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 58 42 167.976 168 0.626339 0.265368
2 16 102 86 171.981 176 0.0130061 0.319466
3 16 141 125 166.649 156 0.0164162 0.340917
4 16 186 170 169.983 180 1.46721 0.344456
5 16 244 228 182.383 232 0.529887 0.322279
6 16 280 264 175.983 144 0.248866 0.339829
7 16 320 304 173.698 160 0.0182624 0.3492
8 16 353 337 168.485 132 0.789411 0.362284
9 16 392 376 167.096 156 0.280278 0.363076
10 16 436 420 167.984 176 0.0186207 0.364722
Total time run: 10.5163
Total reads made: 436
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 165.837
Average IOPS: 41
Stddev IOPS: 6.76593
Max IOPS: 58
Min IOPS: 33
Average Latency(s): 0.385245
Max latency(s): 1.81629
Min latency(s): 0.0124946
# rados bench -p scbench 10 rand
Code:
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 66 50 199.97 200 0.441923 0.198354
2 16 101 85 169.98 140 0.207021 0.306616
3 16 142 126 167.981 164 0.562973 0.329745
4 16 178 162 161.983 144 0.00239542 0.365981
5 16 220 204 163.183 168 0.509352 0.353979
6 16 273 257 171.316 212 0.355821 0.34913
7 16 315 299 170.84 168 0.00242466 0.350936
8 16 354 338 168.983 156 0.394233 0.363754
9 16 394 378 167.983 160 0.482949 0.361794
10 16 435 419 167.584 164 0.00235319 0.361498
Total time run: 10.492
Total reads made: 436
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 166.222
Average IOPS: 41
Stddev IOPS: 5.62633
Max IOPS: 53
Min IOPS: 35
Average Latency(s): 0.382806
Max latency(s): 1.91117
Min latency(s): 0.0022631
# pveversion -v
Code:
proxmox-ve: 6.1-2 (running kernel: 5.3.18-1-pve)
pve-manager: 6.1-7 (running version: 6.1-7/13e58d5e)
pve-kernel-5.3: 6.1-4
pve-kernel-helper: 6.1-4
pve-kernel-5.3.18-1-pve: 5.3.18-1
pve-kernel-5.3.13-2-pve: 5.3.13-2
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 14.2.6-pve1
ceph-fuse: 14.2.6-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.14-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-12
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-4
libpve-storage-perl: 6.1-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-19
pve-docs: 6.1-4
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-10
pve-firmware: 3.0-5
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
# cat /etc/pve/ceph.conf
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.12.12.0/24
fsid = 459c3e1d-06bc-4525-9f95-3e8fb62e2d77
mon_allow_pool_delete = true
mon_host = 185.47.xxx.23 185.47.xxx.25 185.47.xxx.27
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 185.47.xxx.0/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
# cat /etc/network/interfaces
Code:
auto lo
iface lo inet loopback
iface eno1 inet manual
auto eno1.200
iface eno1.200 inet manual
vlan-raw-device eno1
auto vmbr0
iface vmbr0 inet static
address 185.47.xxx.23
netmask 255.255.255.0
gateway 185.47.xxx.1
bridge_ports eno1.200
bridge_stp off
bridge_fd 0
iface eno2 inet manual
auto ibs7
iface ibs7 inet static
address 10.12.12.13
netmask 255.255.255.0
pre-up modprobe ib_ipoib
pre-up modprobe mlx4_ib
pre-up modprobe ib_umad
pre-up echo connected > /sys/class/net/ibs7/mode
mtu 65520
Switch Voltaire 4036:
4036-46EC# module-firmware show
Code:
Module No. Type Node GUID LID FW Version SW Version
---------- ---- --------- --- ---------- ----------
4036/2036 3.9.1-985
---------
CPLD 1 0xa
IS4 1 0x0008f105002046ec 0 7.4.2200 VLT1210032201
Infiniband card:
# ibstat
Code:
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.33.5100
Hardware version: 1
Node GUID: 0x002590ffff907508
System image GUID: 0x002590ffff90750b
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 2
LMC: 0
SM lid: 2
Capability mask: 0x0251486a
Port GUID: 0x002590ffff907509
Link layer: InfiniBand
# iperf -c 10.12.12.13
Code:
------------------------------------------------------------
Client connecting to 10.12.12.13, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[ 3] local 10.12.12.15 port 58698 connected with 10.12.12.13 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 31.6 GBytes 27.2 Gbits/sec
Any idea? Any help would be appreciated. Thank you in advance.
Last edited: