Hi Folks,
we have a 3 node cluster with one independent ceph-ring that is directly connected between the 3 nodes. (N1->N2, N2->N3) with QSFP+ AOC-cables¹. Here we have very bad latency on ping tests.
The directly connected setup works flawlessly on other clusters. The only difference is we switches from SFP+/10G to QSFP+ 40G direct links. Link/Bond-Setup is simple on all clusters:
10G SFP+:
40G QSFP+ AOC:
As one can see, the latency is horrible. Any ideas? Any help is greatly appreciated.
¹ https://www.fs.com/de/products/120520.html?attribute=1691&id=196850
we have a 3 node cluster with one independent ceph-ring that is directly connected between the 3 nodes. (N1->N2, N2->N3) with QSFP+ AOC-cables¹. Here we have very bad latency on ping tests.
The directly connected setup works flawlessly on other clusters. The only difference is we switches from SFP+/10G to QSFP+ 40G direct links. Link/Bond-Setup is simple on all clusters:
iface bond0 inet static
address 172.16.0.5
netmask 255.255.255.0
bond-slaves enp132s0f2 enp132s0f3
bond-mode broadcast
bond-miimon 100
10G SFP+:
84:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
84:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
PING 172.16.2.103 (172.16.2.103) 56(84) bytes of data.
64 bytes from 172.16.2.103: icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from 172.16.2.103: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 172.16.2.103: icmp_seq=3 ttl=64 time=0.056 ms
64 bytes from 172.16.2.103: icmp_seq=4 ttl=64 time=0.054 ms
64 bytes from 172.16.2.103: icmp_seq=5 ttl=64 time=0.063 ms
64 bytes from 172.16.2.103: icmp_seq=6 ttl=64 time=0.053 ms
64 bytes from 172.16.2.103: icmp_seq=7 ttl=64 time=0.062 ms
64 bytes from 172.16.2.103: icmp_seq=8 ttl=64 time=0.071 ms
64 bytes from 172.16.2.103: icmp_seq=9 ttl=64 time=0.104 ms
64 bytes from 172.16.2.103: icmp_seq=10 ttl=64 time=0.063 ms
40G QSFP+ AOC:
21:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
21:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
PING 172.16.0.6 (172.16.0.6) 56(84) bytes of data.
64 bytes from 172.16.0.6: icmp_seq=1 ttl=64 time=1.08 ms
64 bytes from 172.16.0.6: icmp_seq=2 ttl=64 time=0.638 ms
64 bytes from 172.16.0.6: icmp_seq=3 ttl=64 time=0.628 ms
64 bytes from 172.16.0.6: icmp_seq=4 ttl=64 time=0.609 ms
64 bytes from 172.16.0.6: icmp_seq=5 ttl=64 time=1.31 ms
64 bytes from 172.16.0.6: icmp_seq=6 ttl=64 time=1.30 ms
64 bytes from 172.16.0.6: icmp_seq=7 ttl=64 time=1.31 ms
64 bytes from 172.16.0.6: icmp_seq=8 ttl=64 time=1.06 ms
64 bytes from 172.16.0.6: icmp_seq=9 ttl=64 time=1.32 ms
64 bytes from 172.16.0.6: icmp_seq=10 ttl=64 time=1.33 ms
As one can see, the latency is horrible. Any ideas? Any help is greatly appreciated.
¹ https://www.fs.com/de/products/120520.html?attribute=1691&id=196850