Hi,
Long shot.... Trying to repair the questionable decision to reinstall for upgrade from 7.4.x and to 8.1.4.
Question: Why is the iperf3 showing these issues? Not necessarily even a problem, but still very strange results.
Background: 4 host machines 256 core in total 2TB RAM, all Enterprise SSD:s either SATA 6Gb/s or SAS 12GB/s SSD, Network 2x10Gbit for VM:s & 2x10Gbit for CEPH. One is built-in 2x10G on the PCB, the other is PCI-E on a very modern server high end server, that before upgrade did 20Gbit/s.
Before upgrade from Proxmox 7, there was no issues what so ever on networking side, after reinstall the vmbr0 was tied to Bond0 that was LACP and Bond1 contains CEPH isolated network on different network segment.
From switch side, a deep buffer Huawei CE6870 DCN switch with eth-trunk in mode lacp dynamic is configured. eth-trunks are all fine from switch side, forwarding is working fine.
Reason for these tests: After PMX 8.1.4 creating the cluster has issues and when adding ceph then tons of connection issues seems to happen, I have posted about it on the forums but still no solution at all so far. Seems hopeless. Reinstalling all of it and trying to solve it along the way.
Since cluster fails 8 weeks ago only 1 server is running vm:s the other 3 servers is in troubleshooting, and this below might be a part of it. As the external traffic is maximizing the 1Gbit connection the switch has, then we know the interfaces is working.
Hosts are in network segments, one L3 with 172.16.X.1/24, Jumboframes is enabled and fully working.
CEPH segment is fully isolated in 10.X.X.1/24 in segmented vlan.
When testing iperf3 the results is strange, I tested:
vmbr0 bound to Bond0 from server in same segment fully working LACP from switch side,
Iperf bound to Bond1 (ceph network)
Removed LACP config shut interface on Switch, removed vmbr0 from Bond0 and added it to ens2f0.
Prove MTU size is fine
If testing for UDP and in -V :
And of course if using pure wget:
Current interfaces config (without LACP for trouble shooting):
Switch side when downloading the ubunto iso:
Long shot.... Trying to repair the questionable decision to reinstall for upgrade from 7.4.x and to 8.1.4.
Question: Why is the iperf3 showing these issues? Not necessarily even a problem, but still very strange results.
Background: 4 host machines 256 core in total 2TB RAM, all Enterprise SSD:s either SATA 6Gb/s or SAS 12GB/s SSD, Network 2x10Gbit for VM:s & 2x10Gbit for CEPH. One is built-in 2x10G on the PCB, the other is PCI-E on a very modern server high end server, that before upgrade did 20Gbit/s.
Before upgrade from Proxmox 7, there was no issues what so ever on networking side, after reinstall the vmbr0 was tied to Bond0 that was LACP and Bond1 contains CEPH isolated network on different network segment.
From switch side, a deep buffer Huawei CE6870 DCN switch with eth-trunk in mode lacp dynamic is configured. eth-trunks are all fine from switch side, forwarding is working fine.
Reason for these tests: After PMX 8.1.4 creating the cluster has issues and when adding ceph then tons of connection issues seems to happen, I have posted about it on the forums but still no solution at all so far. Seems hopeless. Reinstalling all of it and trying to solve it along the way.
Since cluster fails 8 weeks ago only 1 server is running vm:s the other 3 servers is in troubleshooting, and this below might be a part of it. As the external traffic is maximizing the 1Gbit connection the switch has, then we know the interfaces is working.
Hosts are in network segments, one L3 with 172.16.X.1/24, Jumboframes is enabled and fully working.
CEPH segment is fully isolated in 10.X.X.1/24 in segmented vlan.
When testing iperf3 the results is strange, I tested:
vmbr0 bound to Bond0 from server in same segment fully working LACP from switch side,
Iperf bound to Bond1 (ceph network)
Removed LACP config shut interface on Switch, removed vmbr0 from Bond0 and added it to ens2f0.
root@pmx3:~# iperf3 -s-----------------------------------------------------------Server listening on 5201 (test #1)-----------------------------------------------------------Accepted connection from 172.16.X.102, port 48978[ 5] local 172.16.X.103 port 5201 connected to 172.16.X.102 port 48992[ ID] Interval Transfer Bitrate[ 5] 0.00-1.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec - - - - - - - - - - - - - - - - - - - - - - - - -[ ID] Interval Transfer Bitrate[ 5] 0.00-10.00 sec 0.00 Bytes 0.00 bits/sec receiver-----------------------------------------------------------Prove MTU size is fine
root@pmx2:~# ping -s 9000 172.16.X.103PING 172.16.X.103 (172.16.X.103) 9000(9028) bytes of data.9008 bytes from 172.16.X.103: icmp_seq=1 ttl=64 time=0.214 ms9008 bytes from 172.16.X.103: icmp_seq=2 ttl=64 time=0.162 ms9008 bytes from 172.16.X.103: icmp_seq=3 ttl=64 time=0.175 ms9008 bytes from 172.16.X.103: icmp_seq=4 ttl=64 time=0.163 ms9008 bytes from 172.16.X.103: icmp_seq=5 ttl=64 time=0.185 ms^C--- 172.16.X.103 ping statistics ---5 packets transmitted, 5 received, 0% packet loss, time 4094msrtt min/avg/max/mdev = 0.162/0.179/0.214/0.019 msIf testing for UDP and in -V :
Code:
iperf3 -c 172.16.X.103 -u -w 9000
Result is very strange, and same if use -R or -W 1000Time: Wed, 03 Apr 2024 18:49:59 GMTAccepted connection from 172.16.X.102, port 60310 Cookie: r6lpsg7me4mxikbs33sbur4v7fpqmojdnghh Target Bitrate: 1048576[ 5] local 172.16.X.103 port 5201 connected to 172.16.X.102 port 43918Starting Test: protocol: UDP, 1 streams, 9148 byte blocks, omitting 0 seconds, 10 second test, tos 0[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams[ 5] 0.00-1.00 sec 134 KBytes 1.10 Mbits/sec 0.005 ms 0/15 (0%) [ 5] 1.00-2.00 sec 125 KBytes 1.02 Mbits/sec 0.005 ms 0/14 (0%) [ 5] 2.00-3.00 sec 125 KBytes 1.02 Mbits/sec 0.006 ms 0/14 (0%) [ 5] 3.00-4.00 sec 134 KBytes 1.10 Mbits/sec 0.007 ms 0/15 (0%) [ 5] 4.00-5.00 sec 125 KBytes 1.02 Mbits/sec 0.009 ms 0/14 (0%) [ 5] 5.00-6.00 sec 125 KBytes 1.02 Mbits/sec 0.009 ms 0/14 (0%) [ 5] 6.00-7.00 sec 134 KBytes 1.10 Mbits/sec 0.007 ms 0/15 (0%) [ 5] 7.00-8.00 sec 125 KBytes 1.02 Mbits/sec 0.007 ms 0/14 (0%) [ 5] 8.00-9.00 sec 125 KBytes 1.02 Mbits/sec 0.006 ms 0/14 (0%) [ 5] 9.00-10.00 sec 134 KBytes 1.10 Mbits/sec 0.006 ms 0/15 (0%) - - - - - - - - - - - - - - - - - - - - - - - - -Test Complete. Summary Results:[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams[ 5] (sender statistics not available)[ 5] 0.00-10.00 sec 1.26 MBytes 1.05 Mbits/sec 0.006 ms 0/144 (0%) receiveriperf 3.12Linux pmx3 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64And of course if using pure wget:
Connecting to gemmei.ftp.acc.umu.se (gemmei.ftp.acc.umu.se)|194.71.11.137|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 4390459392 (4.1G) [application/x-iso9660-image]Saving to: ‘ubuntu-23.10-desktop-legacy-amd64.iso’ubuntu-23.10-desktop-legacy- 100%[===========================================>] 4.09G [B]110MB/s[/B] in 39s 2024-04-03 20:57:48 (109 MB/s) - ‘ubuntu-23.10-desktop-legacy-amd64.iso’ saved [4390459392/4390459392]Current interfaces config (without LACP for trouble shooting):
auto loiface lo inet loopbackauto ens2f0iface ens2f0 inet manual mtu 9200auto ens2f1iface ens2f1 inet manual mtu 9200auto eno1iface eno1 inet manual mtu 9200auto eno2iface eno2 inet manual mtu 9200iface eno3 inet manualiface eno4 inet manualauto bond0iface bond0 inet manual bond-slaves ens2f1 bond-miimon 100 bond-mode 802.3ad bond-xmit-hash-policy layer3+4 mtu 9200#VM-Trafficauto bond1iface bond1 inet static address 10.X.X.12/24 bond-slaves eno1 eno2 bond-miimon 100 bond-mode 802.3ad bond-xmit-hash-policy layer3+4 mtu 9200#CEPHauto vmbr0iface vmbr0 inet static address 172.16.X.102/24 gateway 172.16.102.1 bridge-ports ens2f0 bridge-stp off bridge-fd 0 mtu 9200source /etc/network/interfaces.d/*root@pmx2:~#Switch side when downloading the ubunto iso:
<CE6870>dis int 10GE1/0/1110GE1/0/11 current state : UP (ifindex: 15)Line protocol current state : UPDescription: PMX2.VMTRAFFICSwitch Port, TPID : 8100(Hex), The Maximum Frame Length is 9216IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is c4b8-b4b3-2011Port Mode: COMMON COPPER, Port Split/Aggregate: -Speed: 10000, Loopback: NONEDuplex: FULL, Negotiation: DISABLEInput Flow-control: DISABLE, Output Flow-control: DISABLEMdi: AUTO, Fec: NONELast physical up time : 2024-04-03 13:11:12Last physical down time : 2024-04-03 13:11:00Current system time: 2024-04-03 19:10:24Statistics last cleared:2024-04-03 17:06:52 Last 10 seconds input rate: 25968961 bits/sec, 36049 packets/sec[B] Last 10 seconds output rate: 986884778 bits/sec, 80220 packets/sec[/B] Input peak rate 25968961 bits/sec, Record time: 2024-04-03 19:10:24 Output peak rate 1210337783 bits/sec, Record time: 2024-04-03 17:32:56 Input : 187516756 bytes, 1907980 packets Output: 9042269308 bytes, 4872101 packets Input: Unicast: 1906014, Multicast: 645 Broadcast: 107, Jumbo: 1049 Discard: 0, Frames: 0 Pause: 0 Total Error: 165 CRC: 0, Giants: 165 Jabbers: 0, Fragments: 0 Runts: 0, DropEvents: 0 Alignments: 0, Symbols: 0 Ignoreds: 0 Output: Unicast: 4662681, Multicast: 3623 Broadcast: 1244, Jumbo: 204553 Discard: 0, Buffers Purged: 0 Pause: 0 Input bandwidth utilization threshold : 90.00% Output bandwidth utilization threshold: 90.00% Last 10 seconds input utility rate: [B]0.25%[/B] Last 10 seconds output utility rate:[B] 9.86%[/B]


