perf. issue with LACP (2+3) : ceph poor performance (with powerfull hardware).

atec666

Member
Mar 8, 2019
136
4
18
Issoire
Here is my setup (newly buy)

3 node with x :

- bi xeon 3.2Ghz (16 x 2 core)
- 90 Go RAM
- 6 x 1 To HDD 7200 (ceph osd) + 2x 500 go hdd (ZFS RAID1 proxmox install)
- 5 x 1gigabit nic : ceph public (1 nic) and ceph private network (3 gigabit nic in lacp 2+3 , dedicated to ceph storage trafic on a dedicated, and isolated , switch)

Test :

1 -
iperf between two CT (two node !): [ 4] 0.0-10.0 sec 1.10 GBytes 941 Mbits/sec (stucking at 1Gigbit not 3Gbit ...)

2 -
root@test1:~# dd if=/dev/zero of=/tmp/512Mo bs=512M count=1 oflag=direct
1+0 records in
1+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 6.7804 s, 79.2 MB/s
 
+
log (ceph -w)

2020-04-13 20:19:15.593086 mon.host1 [WRN] Health check update: 19 slow ops, oldest one blocked for 635 sec, mon.tankster1 has slow ops (SLOW_OPS)
 
For compare my old setup :

3 node x :

- core i5 3470
- 16 Go RAM
- 2 x 2 To HDD 7200 (ceph osd) + 2x 120GB SSS (RAID1 proxmox install)
- 3 gigabit nic : ceph public (1 nic) and ceph private network (1 gigabit nic in lacp 2+3 , dedicated to ceph storage trafic on a dedicated, and isolated , switch)

root@test1:~# dd if=/dev/zero of=/tmp/512Mo bs=512M count=1 oflag=direct
1+0 records in
1+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 6.7804 s, 63 MB/s [Test with 1 CT only]
 
Last edited:
... i do a test with [6 CT simultanously doing a dd if= ...] :
it seems that Ceph does not use all 3 Gigabit link (but 2 ...) : i can add all bandwith , 174MB/s (177 IOPS writes)
 
k (3 gigabit nic in lacp 2+3 , dedicated to ceph storage trafic on a dedicated, and isolated , switch)

Test :

1 -
iperf between two CT (two node !): [ 4] 0.0-10.0 sec 1.10 GBytes 941 Mbits/sec (stucking at 1Gigbit not 3Gbit ...)

you can't have more the 1 link for each tcp connection. you need to test iperf with multiple streams (-P option).

also, use lacp layer3+4 , not layer2+3. (with layer 2+3, the hash algo will also use same link for ipsrc-ipdest, with layer3+4 it's also ipsrc-ipdst-srcport-dstport, so it'll work with multiple connections, like multiples osd)
 
  • Like
Reactions: willybong
you can't have more the 1 link for each tcp connection. you need to test iperf with multiple streams (-P option).

also, use lacp layer3+4 , not layer2+3. (with layer 2+3, the hash algo will also use same link for ipsrc-ipdest, with layer3+4 it's also ipsrc-ipdst-srcport-dstport, so it'll work with multiple connections, like multiples osd)
ouchchhhh : i will give it a try , it's seem to be the answer !
For those answers
 
thkx alwin, i already read this doc.
BUT :

When ceph and pve are on the same hardware (server) , with 3 nodes : what is the mean of Public network and Cluster network ?
Where datas are passing throught ? (really)

The cluster network is used for one OSD to send its data to the other replicated OSD to do your 3-way replication, the public network is where the primary OSD will receive the data from the client.

As you have multiple 1Gbps NIC's if you put 3 in public and 2 in the cluster it will allow you to split the traffic a bit more than just what LACP can do on 5 * 1 Gbps NIC's