Hi,
We are using a three-node hyperconverged cluster. We are experiencing latency on only one of our nodes.
On two nodes, latency varies between 1 and 5.
On the last node it can be up to 300.
PG autoscale is On and the Crush Rule : replicated_rule.
The nodes are strictly identical:
Dell Poweredge R640
CPU Intel(R) Xeon(R) Gold 6130 CPU
RAM 6x 32Go DDR-4 2666Mhz
Controller Dell HBA330 v16.17.01.00
3 x SSD SATA VK001920GWSXK 1,92To
Network Intel(R) Ethernet 10G 4P X540
PVE 7.4.17 & Ceph 17.2.6
I haven't identified any hardware problems, the network seems OK.
Any idea where the latency is coming from? Any known issues with these versions?
Technical information :
root@asi-prd-01:~# ceph -s
root@asi-prd-01:~# hdparm -tT /dev/sdb
root@asi-prd-01:~# pveversion -v
root@asi-prd-01:~# rados -p test bench 30 write
root@asi-prd-01:~# iperf3 -c 10.0.50.2
We are using a three-node hyperconverged cluster. We are experiencing latency on only one of our nodes.
On two nodes, latency varies between 1 and 5.
On the last node it can be up to 300.
PG autoscale is On and the Crush Rule : replicated_rule.
The nodes are strictly identical:
Dell Poweredge R640
CPU Intel(R) Xeon(R) Gold 6130 CPU
RAM 6x 32Go DDR-4 2666Mhz
Controller Dell HBA330 v16.17.01.00
3 x SSD SATA VK001920GWSXK 1,92To
Network Intel(R) Ethernet 10G 4P X540
PVE 7.4.17 & Ceph 17.2.6
I haven't identified any hardware problems, the network seems OK.
Any idea where the latency is coming from? Any known issues with these versions?
Technical information :
root@asi-prd-01:~# ceph -s
Code:
cluster:
id: 7aacf6bf-19ba-4abc-868b-68cbdc9a0bb8
health: HEALTH_OK
services:
mon: 3 daemons, quorum asi-prd-01,asi-prd-02,asi-prd-03 (age 2d)
mgr: asi-prd-03(active, since 10d), standbys: asi-prd-02, asi-prd-01
osd: 9 osds: 9 up (since 2d), 9 in (since 2d)
data:
pools: 3 pools, 97 pgs
objects: 287.00k objects, 1.0 TiB
usage: 3.2 TiB used, 13 TiB / 16 TiB avail
pgs: 97 active+clean
io:
client: 16 MiB/s rd, 315 KiB/s wr, 310 op/s rd, 35 op/s wr
root@asi-prd-01:~# hdparm -tT /dev/sdb
Code:
/dev/sdb:
Timing cached reads: 18478 MB in 1.99 seconds = 9305.55 MB/sec
Timing buffered disk reads: 1324 MB in 3.01 seconds = 440.27 MB/sec
root@asi-prd-01:~# pveversion -v
Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.131-2-pve)
pve-manager: 7.4-17 (running version: 7.4-17/513c62be)
pve-kernel-5.15: 7.4-9
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.126-1-pve: 5.15.126-1
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.6-pve1
ceph-fuse: 17.2.6-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.4-1
proxmox-backup-file-restore: 2.4.4-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-6
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.14-pve1
root@asi-prd-01:~# rados -p test bench 30 write
Code:
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 30 seconds or 0 objects
Object prefix: benchmark_data_asi-prd-01_989622
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 16 0 0 0 - 0
2 16 21 5 9.95536 10 1.97451 1.60941
3 16 24 8 10.612 12 2.767 2.02831
4 16 29 13 12.9299 20 1.77919 2.27624
5 15 33 18 14.3197 20 3.77611 2.757
6 16 43 27 17.8922 36 1.58408 2.88553
7 16 50 34 19.2919 28 1.8946 2.79278
8 16 54 38 18.867 16 2.14249 2.7758
9 16 59 43 18.9795 20 2.09039 2.76092
10 16 65 49 19.4603 24 2.68145 2.76326
11 16 70 54 19.4949 20 1.46891 2.74555
12 16 76 60 19.855 24 3.21977 2.80772
13 16 81 65 19.8572 20 4.12265 2.82984
14 16 87 71 20.1416 24 2.35672 2.84354
15 16 93 77 20.388 24 2.44081 2.81861
16 15 95 80 19.854 12 1.99903 2.83293
17 16 101 85 19.8517 20 2.03777 2.84295
18 16 110 94 20.7329 36 2.33814 2.88426
19 15 113 98 20.4677 16 1.90445 2.84914
2024-01-15T14:22:34.919623+0100 min lat: 1.24598 max lat: 5.41299 avg lat: 2.85163
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
20 16 120 104 20.6389 24 2.8987 2.85163
21 16 123 107 20.2227 12 1.70059 2.86109
22 16 132 116 20.9259 36 2.46603 2.89194
23 16 135 119 20.5351 12 2.41966 2.89056
24 16 139 123 20.3403 16 3.17779 2.89162
25 16 148 132 20.9553 36 3.34379 2.90209
26 16 150 134 20.4556 8 2.6792 2.89
27 16 159 143 21.0183 36 2.18752 2.9074
28 15 164 149 21.1184 24 2.39634 2.91213
29 16 169 153 20.9379 16 1.84626 2.88639
30 14 175 161 21.2989 32 3.36069 2.86418
31 8 175 167 21.38 24 2.8363 2.85464
Total time run: 31.8613
Total writes made: 175
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 21.9702
Stddev Bandwidth: 9.16046
Max bandwidth (MB/sec): 36
Min bandwidth (MB/sec): 0
Average IOPS: 5
Stddev IOPS: 2.31219
Max IOPS: 9
Min IOPS: 0
Average Latency(s): 2.84716
Stddev Latency(s): 0.879404
Max latency(s): 5.41299
Min latency(s): 1.24598
Cleaning up (deleting benchmark objects)
Removed 175 objects
Clean up completed and total clean up time :0.10798
root@asi-prd-01:~# iperf3 -c 10.0.50.2
Code:
Connecting to host 10.0.50.2, port 5201
[ 5] local 10.0.50.1 port 35980 connected to 10.0.50.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.06 GBytes 9.15 Gbits/sec 1687 987 KBytes
[ 5] 1.00-2.00 sec 1.05 GBytes 9.02 Gbits/sec 1467 1.01 MBytes
[ 5] 2.00-3.00 sec 1.06 GBytes 9.09 Gbits/sec 17 1.47 MBytes
[ 5] 3.00-4.00 sec 1.08 GBytes 9.24 Gbits/sec 605 1.24 MBytes
[ 5] 4.00-5.00 sec 840 MBytes 7.05 Gbits/sec 1475 990 KBytes
[ 5] 5.00-6.00 sec 1.07 GBytes 9.20 Gbits/sec 512 1.17 MBytes
[ 5] 6.00-7.00 sec 1.07 GBytes 9.17 Gbits/sec 324 1.36 MBytes
[ 5] 7.00-8.00 sec 1.07 GBytes 9.23 Gbits/sec 1087 1.05 MBytes
[ 5] 8.00-9.00 sec 1.06 GBytes 9.07 Gbits/sec 0 1.48 MBytes
[ 5] 9.00-10.00 sec 860 MBytes 7.22 Gbits/sec 450 1.37 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.2 GBytes 8.74 Gbits/sec 7624 sender
[ 5] 0.00-10.04 sec 10.2 GBytes 8.71 Gbits/sec receiver
iperf Done.




