Hi,
Long shot.... Trying to repair the questionable decision to reinstall for upgrade from 7.4.x and to 8.1.4.
Question: Why is the iperf3 showing these issues? Not necessarily even a problem, but still very strange results.
Background: 4 host machines 256 core in total 2TB RAM, all Enterprise SSD:s either SATA 6Gb/s or SAS 12GB/s SSD, Network 2x10Gbit for VM:s & 2x10Gbit for CEPH. One is built-in 2x10G on the PCB, the other is PCI-E on a very modern server high end server, that before upgrade did 20Gbit/s.
Before upgrade from Proxmox 7, there was no issues what so ever on networking side, after reinstall the vmbr0 was tied to Bond0 that was LACP and Bond1 contains CEPH isolated network on different network segment.
From switch side, a deep buffer Huawei CE6870 DCN switch with eth-trunk in mode lacp dynamic is configured. eth-trunks are all fine from switch side, forwarding is working fine.
Reason for these tests: After PMX 8.1.4 creating the cluster has issues and when adding ceph then tons of connection issues seems to happen, I have posted about it on the forums but still no solution at all so far. Seems hopeless. Reinstalling all of it and trying to solve it along the way.
Since cluster fails 8 weeks ago only 1 server is running vm:s the other 3 servers is in troubleshooting, and this below might be a part of it. As the external traffic is maximizing the 1Gbit connection the switch has, then we know the interfaces is working.
Hosts are in network segments, one L3 with 172.16.X.1/24, Jumboframes is enabled and fully working.
CEPH segment is fully isolated in 10.X.X.1/24 in segmented vlan.
When testing iperf3 the results is strange, I tested:
vmbr0 bound to Bond0 from server in same segment fully working LACP from switch side,
Iperf bound to Bond1 (ceph network)
Removed LACP config shut interface on Switch, removed vmbr0 from Bond0 and added it to ens2f0.
Prove MTU size is fine
If testing for UDP and in -V :
And of course if using pure wget:
Current interfaces config (without LACP for trouble shooting):
Switch side when downloading the ubunto iso:
Long shot.... Trying to repair the questionable decision to reinstall for upgrade from 7.4.x and to 8.1.4.
Question: Why is the iperf3 showing these issues? Not necessarily even a problem, but still very strange results.
Background: 4 host machines 256 core in total 2TB RAM, all Enterprise SSD:s either SATA 6Gb/s or SAS 12GB/s SSD, Network 2x10Gbit for VM:s & 2x10Gbit for CEPH. One is built-in 2x10G on the PCB, the other is PCI-E on a very modern server high end server, that before upgrade did 20Gbit/s.
Before upgrade from Proxmox 7, there was no issues what so ever on networking side, after reinstall the vmbr0 was tied to Bond0 that was LACP and Bond1 contains CEPH isolated network on different network segment.
From switch side, a deep buffer Huawei CE6870 DCN switch with eth-trunk in mode lacp dynamic is configured. eth-trunks are all fine from switch side, forwarding is working fine.
Reason for these tests: After PMX 8.1.4 creating the cluster has issues and when adding ceph then tons of connection issues seems to happen, I have posted about it on the forums but still no solution at all so far. Seems hopeless. Reinstalling all of it and trying to solve it along the way.
Since cluster fails 8 weeks ago only 1 server is running vm:s the other 3 servers is in troubleshooting, and this below might be a part of it. As the external traffic is maximizing the 1Gbit connection the switch has, then we know the interfaces is working.
Hosts are in network segments, one L3 with 172.16.X.1/24, Jumboframes is enabled and fully working.
CEPH segment is fully isolated in 10.X.X.1/24 in segmented vlan.
When testing iperf3 the results is strange, I tested:
vmbr0 bound to Bond0 from server in same segment fully working LACP from switch side,
Iperf bound to Bond1 (ceph network)
Removed LACP config shut interface on Switch, removed vmbr0 from Bond0 and added it to ens2f0.
root@pmx3:~# iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 172.16.X.102, port 48978
[ 5] local 172.16.X.103 port 5201 connected to 172.16.X.102 port 48992
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 0.00 Bytes 0.00 bits/sec receiver
-----------------------------------------------------------
Prove MTU size is fine
root@pmx2:~# ping -s 9000 172.16.X.103
PING 172.16.X.103 (172.16.X.103) 9000(9028) bytes of data.
9008 bytes from 172.16.X.103: icmp_seq=1 ttl=64 time=0.214 ms
9008 bytes from 172.16.X.103: icmp_seq=2 ttl=64 time=0.162 ms
9008 bytes from 172.16.X.103: icmp_seq=3 ttl=64 time=0.175 ms
9008 bytes from 172.16.X.103: icmp_seq=4 ttl=64 time=0.163 ms
9008 bytes from 172.16.X.103: icmp_seq=5 ttl=64 time=0.185 ms
^C
--- 172.16.X.103 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4094ms
rtt min/avg/max/mdev = 0.162/0.179/0.214/0.019 ms
If testing for UDP and in -V :
Code:
iperf3 -c 172.16.X.103 -u -w 9000
Result is very strange, and same if use -R or -W 1000
Time: Wed, 03 Apr 2024 18:49:59 GMT
Accepted connection from 172.16.X.102, port 60310
Cookie: r6lpsg7me4mxikbs33sbur4v7fpqmojdnghh
Target Bitrate: 1048576
[ 5] local 172.16.X.103 port 5201 connected to 172.16.X.102 port 43918
Starting Test: protocol: UDP, 1 streams, 9148 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-1.00 sec 134 KBytes 1.10 Mbits/sec 0.005 ms 0/15 (0%)
[ 5] 1.00-2.00 sec 125 KBytes 1.02 Mbits/sec 0.005 ms 0/14 (0%)
[ 5] 2.00-3.00 sec 125 KBytes 1.02 Mbits/sec 0.006 ms 0/14 (0%)
[ 5] 3.00-4.00 sec 134 KBytes 1.10 Mbits/sec 0.007 ms 0/15 (0%)
[ 5] 4.00-5.00 sec 125 KBytes 1.02 Mbits/sec 0.009 ms 0/14 (0%)
[ 5] 5.00-6.00 sec 125 KBytes 1.02 Mbits/sec 0.009 ms 0/14 (0%)
[ 5] 6.00-7.00 sec 134 KBytes 1.10 Mbits/sec 0.007 ms 0/15 (0%)
[ 5] 7.00-8.00 sec 125 KBytes 1.02 Mbits/sec 0.007 ms 0/14 (0%)
[ 5] 8.00-9.00 sec 125 KBytes 1.02 Mbits/sec 0.006 ms 0/14 (0%)
[ 5] 9.00-10.00 sec 134 KBytes 1.10 Mbits/sec 0.006 ms 0/15 (0%)
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] (sender statistics not available)
[ 5] 0.00-10.00 sec 1.26 MBytes 1.05 Mbits/sec 0.006 ms 0/144 (0%) receiver
iperf 3.12
Linux pmx3 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64
And of course if using pure wget:
Connecting to gemmei.ftp.acc.umu.se (gemmei.ftp.acc.umu.se)|194.71.11.137|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4390459392 (4.1G) [application/x-iso9660-image]
Saving to: ‘ubuntu-23.10-desktop-legacy-amd64.iso’
ubuntu-23.10-desktop-legacy- 100%[===========================================>] 4.09G [B]110MB/s[/B] in 39s
2024-04-03 20:57:48 (109 MB/s) - ‘ubuntu-23.10-desktop-legacy-amd64.iso’ saved [4390459392/4390459392]
Current interfaces config (without LACP for trouble shooting):
auto lo
iface lo inet loopback
auto ens2f0
iface ens2f0 inet manual
mtu 9200
auto ens2f1
iface ens2f1 inet manual
mtu 9200
auto eno1
iface eno1 inet manual
mtu 9200
auto eno2
iface eno2 inet manual
mtu 9200
iface eno3 inet manual
iface eno4 inet manual
auto bond0
iface bond0 inet manual
bond-slaves ens2f1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9200
#VM-Traffic
auto bond1
iface bond1 inet static
address 10.X.X.12/24
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9200
#CEPH
auto vmbr0
iface vmbr0 inet static
address 172.16.X.102/24
gateway 172.16.102.1
bridge-ports ens2f0
bridge-stp off
bridge-fd 0
mtu 9200
source /etc/network/interfaces.d/*
root@pmx2:~#
Switch side when downloading the ubunto iso:
<CE6870>dis int 10GE1/0/11
10GE1/0/11 current state : UP (ifindex: 15)
Line protocol current state : UP
Description: PMX2.VMTRAFFIC
Switch Port, TPID : 8100(Hex), The Maximum Frame Length is 9216
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is c4b8-b4b3-2011
Port Mode: COMMON COPPER, Port Split/Aggregate: -
Speed: 10000, Loopback: NONE
Duplex: FULL, Negotiation: DISABLE
Input Flow-control: DISABLE, Output Flow-control: DISABLE
Mdi: AUTO, Fec: NONE
Last physical up time : 2024-04-03 13:11:12
Last physical down time : 2024-04-03 13:11:00
Current system time: 2024-04-03 19:10:24
Statistics last cleared:2024-04-03 17:06:52
Last 10 seconds input rate: 25968961 bits/sec, 36049 packets/sec
[B] Last 10 seconds output rate: 986884778 bits/sec, 80220 packets/sec[/B]
Input peak rate 25968961 bits/sec, Record time: 2024-04-03 19:10:24
Output peak rate 1210337783 bits/sec, Record time: 2024-04-03 17:32:56
Input : 187516756 bytes, 1907980 packets
Output: 9042269308 bytes, 4872101 packets
Input:
Unicast: 1906014, Multicast: 645
Broadcast: 107, Jumbo: 1049
Discard: 0, Frames: 0
Pause: 0
Total Error: 165
CRC: 0, Giants: 165
Jabbers: 0, Fragments: 0
Runts: 0, DropEvents: 0
Alignments: 0, Symbols: 0
Ignoreds: 0
Output:
Unicast: 4662681, Multicast: 3623
Broadcast: 1244, Jumbo: 204553
Discard: 0, Buffers Purged: 0
Pause: 0
Input bandwidth utilization threshold : 90.00%
Output bandwidth utilization threshold: 90.00%
Last 10 seconds input utility rate: [B]0.25%[/B]
Last 10 seconds output utility rate:[B] 9.86%[/B]