Intel XL710 40G + QSFP+ AOC/DAC-cables - very high latency

sigmarb

Well-Known Member
Nov 8, 2016
69
6
48
38
Hi Folks,

we have a 3 node cluster with one independent ceph-ring that is directly connected between the 3 nodes. (N1->N2, N2->N3) with QSFP+ AOC-cables¹. Here we have very bad latency on ping tests.

The directly connected setup works flawlessly on other clusters. The only difference is we switches from SFP+/10G to QSFP+ 40G direct links. Link/Bond-Setup is simple on all clusters:
iface bond0 inet static address 172.16.0.5 netmask 255.255.255.0 bond-slaves enp132s0f2 enp132s0f3 bond-mode broadcast bond-miimon 100

10G SFP+:
84:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02) 84:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02) PING 172.16.2.103 (172.16.2.103) 56(84) bytes of data. 64 bytes from 172.16.2.103: icmp_seq=1 ttl=64 time=0.049 ms 64 bytes from 172.16.2.103: icmp_seq=2 ttl=64 time=0.056 ms 64 bytes from 172.16.2.103: icmp_seq=3 ttl=64 time=0.056 ms 64 bytes from 172.16.2.103: icmp_seq=4 ttl=64 time=0.054 ms 64 bytes from 172.16.2.103: icmp_seq=5 ttl=64 time=0.063 ms 64 bytes from 172.16.2.103: icmp_seq=6 ttl=64 time=0.053 ms 64 bytes from 172.16.2.103: icmp_seq=7 ttl=64 time=0.062 ms 64 bytes from 172.16.2.103: icmp_seq=8 ttl=64 time=0.071 ms 64 bytes from 172.16.2.103: icmp_seq=9 ttl=64 time=0.104 ms 64 bytes from 172.16.2.103: icmp_seq=10 ttl=64 time=0.063 ms


40G QSFP+ AOC:
21:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) 21:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) PING 172.16.0.6 (172.16.0.6) 56(84) bytes of data. 64 bytes from 172.16.0.6: icmp_seq=1 ttl=64 time=1.08 ms 64 bytes from 172.16.0.6: icmp_seq=2 ttl=64 time=0.638 ms 64 bytes from 172.16.0.6: icmp_seq=3 ttl=64 time=0.628 ms 64 bytes from 172.16.0.6: icmp_seq=4 ttl=64 time=0.609 ms 64 bytes from 172.16.0.6: icmp_seq=5 ttl=64 time=1.31 ms 64 bytes from 172.16.0.6: icmp_seq=6 ttl=64 time=1.30 ms 64 bytes from 172.16.0.6: icmp_seq=7 ttl=64 time=1.31 ms 64 bytes from 172.16.0.6: icmp_seq=8 ttl=64 time=1.06 ms 64 bytes from 172.16.0.6: icmp_seq=9 ttl=64 time=1.32 ms 64 bytes from 172.16.0.6: icmp_seq=10 ttl=64 time=1.33 ms

As one can see, the latency is horrible. Any ideas? Any help is greatly appreciated.

¹ https://www.fs.com/de/products/120520.html?attribute=1691&id=196850
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!