PMX 8.1.4 - Post upgrade - Network throughput problem - Iperf3 0.00 bits/s

jorel83

Active Member
Dec 11, 2017
26
2
43
41
Hi,

Long shot.... Trying to repair the questionable decision to reinstall for upgrade from 7.4.x and to 8.1.4.

Question: Why is the iperf3 showing these issues? Not necessarily even a problem, but still very strange results.

Background: 4 host machines 256 core in total 2TB RAM, all Enterprise SSD:s either SATA 6Gb/s or SAS 12GB/s SSD, Network 2x10Gbit for VM:s & 2x10Gbit for CEPH. One is built-in 2x10G on the PCB, the other is PCI-E on a very modern server high end server, that before upgrade did 20Gbit/s.

Before upgrade from Proxmox 7, there was no issues what so ever on networking side, after reinstall the vmbr0 was tied to Bond0 that was LACP and Bond1 contains CEPH isolated network on different network segment.

From switch side, a deep buffer Huawei CE6870 DCN switch with eth-trunk in mode lacp dynamic is configured. eth-trunks are all fine from switch side, forwarding is working fine.

Reason for these tests: After PMX 8.1.4 creating the cluster has issues and when adding ceph then tons of connection issues seems to happen, I have posted about it on the forums but still no solution at all so far. Seems hopeless. Reinstalling all of it and trying to solve it along the way.

Since cluster fails 8 weeks ago only 1 server is running vm:s the other 3 servers is in troubleshooting, and this below might be a part of it. As the external traffic is maximizing the 1Gbit connection the switch has, then we know the interfaces is working.

Hosts are in network segments, one L3 with 172.16.X.1/24, Jumboframes is enabled and fully working.
CEPH segment is fully isolated in 10.X.X.1/24 in segmented vlan.

When testing iperf3 the results is strange, I tested:
vmbr0 bound to Bond0 from server in same segment fully working LACP from switch side,
Iperf bound to Bond1 (ceph network)
Removed LACP config shut interface on Switch, removed vmbr0 from Bond0 and added it to ens2f0.

root@pmx3:~# iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 172.16.X.102, port 48978
[ 5] local 172.16.X.103 port 5201 connected to 172.16.X.102 port 48992
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec
[ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 0.00 Bytes 0.00 bits/sec receiver
-----------------------------------------------------------

Prove MTU size is fine
root@pmx2:~# ping -s 9000 172.16.X.103
PING 172.16.X.103 (172.16.X.103) 9000(9028) bytes of data.
9008 bytes from 172.16.X.103: icmp_seq=1 ttl=64 time=0.214 ms
9008 bytes from 172.16.X.103: icmp_seq=2 ttl=64 time=0.162 ms
9008 bytes from 172.16.X.103: icmp_seq=3 ttl=64 time=0.175 ms
9008 bytes from 172.16.X.103: icmp_seq=4 ttl=64 time=0.163 ms
9008 bytes from 172.16.X.103: icmp_seq=5 ttl=64 time=0.185 ms
^C
--- 172.16.X.103 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4094ms
rtt min/avg/max/mdev = 0.162/0.179/0.214/0.019 ms


If testing for UDP and in -V :
Code:
iperf3 -c 172.16.X.103 -u -w 9000

Result is very strange, and same if use -R or -W 1000
Time: Wed, 03 Apr 2024 18:49:59 GMT
Accepted connection from 172.16.X.102, port 60310
Cookie: r6lpsg7me4mxikbs33sbur4v7fpqmojdnghh
Target Bitrate: 1048576
[ 5] local 172.16.X.103 port 5201 connected to 172.16.X.102 port 43918
Starting Test: protocol: UDP, 1 streams, 9148 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-1.00 sec 134 KBytes 1.10 Mbits/sec 0.005 ms 0/15 (0%)
[ 5] 1.00-2.00 sec 125 KBytes 1.02 Mbits/sec 0.005 ms 0/14 (0%)
[ 5] 2.00-3.00 sec 125 KBytes 1.02 Mbits/sec 0.006 ms 0/14 (0%)
[ 5] 3.00-4.00 sec 134 KBytes 1.10 Mbits/sec 0.007 ms 0/15 (0%)
[ 5] 4.00-5.00 sec 125 KBytes 1.02 Mbits/sec 0.009 ms 0/14 (0%)
[ 5] 5.00-6.00 sec 125 KBytes 1.02 Mbits/sec 0.009 ms 0/14 (0%)
[ 5] 6.00-7.00 sec 134 KBytes 1.10 Mbits/sec 0.007 ms 0/15 (0%)
[ 5] 7.00-8.00 sec 125 KBytes 1.02 Mbits/sec 0.007 ms 0/14 (0%)
[ 5] 8.00-9.00 sec 125 KBytes 1.02 Mbits/sec 0.006 ms 0/14 (0%)
[ 5] 9.00-10.00 sec 134 KBytes 1.10 Mbits/sec 0.006 ms 0/15 (0%)
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] (sender statistics not available)
[ 5] 0.00-10.00 sec 1.26 MBytes 1.05 Mbits/sec 0.006 ms 0/144 (0%) receiver
iperf 3.12
Linux pmx3 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64

And of course if using pure wget:
Connecting to gemmei.ftp.acc.umu.se (gemmei.ftp.acc.umu.se)|194.71.11.137|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4390459392 (4.1G) [application/x-iso9660-image]
Saving to: ‘ubuntu-23.10-desktop-legacy-amd64.iso’


ubuntu-23.10-desktop-legacy- 100%[===========================================>] 4.09G [B]110MB/s[/B] in 39s


2024-04-03 20:57:48 (109 MB/s) - ‘ubuntu-23.10-desktop-legacy-amd64.iso’ saved [4390459392/4390459392]


Current interfaces config (without LACP for trouble shooting):
auto lo
iface lo inet loopback

auto ens2f0
iface ens2f0 inet manual
mtu 9200

auto ens2f1
iface ens2f1 inet manual
mtu 9200

auto eno1
iface eno1 inet manual
mtu 9200

auto eno2
iface eno2 inet manual
mtu 9200

iface eno3 inet manual
iface eno4 inet manual

auto bond0
iface bond0 inet manual
bond-slaves ens2f1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9200
#VM-Traffic

auto bond1
iface bond1 inet static
address 10.X.X.12/24
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9200
#CEPH

auto vmbr0
iface vmbr0 inet static
address 172.16.X.102/24
gateway 172.16.102.1
bridge-ports ens2f0
bridge-stp off
bridge-fd 0
mtu 9200


source /etc/network/interfaces.d/*
root@pmx2:~#

Switch side when downloading the ubunto iso:
<CE6870>dis int 10GE1/0/11
10GE1/0/11 current state : UP (ifindex: 15)
Line protocol current state : UP
Description: PMX2.VMTRAFFIC
Switch Port, TPID : 8100(Hex), The Maximum Frame Length is 9216
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is c4b8-b4b3-2011
Port Mode: COMMON COPPER, Port Split/Aggregate: -
Speed: 10000, Loopback: NONE
Duplex: FULL, Negotiation: DISABLE
Input Flow-control: DISABLE, Output Flow-control: DISABLE
Mdi: AUTO, Fec: NONE
Last physical up time : 2024-04-03 13:11:12
Last physical down time : 2024-04-03 13:11:00
Current system time: 2024-04-03 19:10:24
Statistics last cleared:2024-04-03 17:06:52
Last 10 seconds input rate: 25968961 bits/sec, 36049 packets/sec
[B] Last 10 seconds output rate: 986884778 bits/sec, 80220 packets/sec[/B]
Input peak rate 25968961 bits/sec, Record time: 2024-04-03 19:10:24
Output peak rate 1210337783 bits/sec, Record time: 2024-04-03 17:32:56
Input : 187516756 bytes, 1907980 packets
Output: 9042269308 bytes, 4872101 packets
Input:
Unicast: 1906014, Multicast: 645
Broadcast: 107, Jumbo: 1049
Discard: 0, Frames: 0
Pause: 0

Total Error: 165
CRC: 0, Giants: 165
Jabbers: 0, Fragments: 0
Runts: 0, DropEvents: 0
Alignments: 0, Symbols: 0
Ignoreds: 0

Output:
Unicast: 4662681, Multicast: 3623
Broadcast: 1244, Jumbo: 204553
Discard: 0, Buffers Purged: 0
Pause: 0

Input bandwidth utilization threshold : 90.00%
Output bandwidth utilization threshold: 90.00%
Last 10 seconds input utility rate: [B]0.25%[/B]
Last 10 seconds output utility rate:[B] 9.86%[/B]
 
so long shot, but have you considered moving from linux bond to OVS (Open vSwitch) to see if the behavior changes? something in the back of my brain is tickling RE: 5.X to 6.x kernel changes...

im traveling so i can't copy/paste config, but IMHO its worth trying. overall OVS has been much more consistent for me ( and im running several CEPH N x 10GE setups like you) when dealing with LACP bundles, both in "fast" bond uptimes and in multi-vendor interop.

Just remember to "apt install openvswitch-switch" before doing this, or you'll end up frustrated when interfaces don't come up


* https://pve.proxmox.com/wiki/Open_vSwitch
 
Last edited:
  • Like
Reactions: jorel83
Ok, here's an OVS config from one of my lab nodes:

VLAN5/500/501 : management
VLAN800/850 : CEPH CLIENT
VLAN900/950: CEPH BACKEND

Code:
auto lo
iface lo inet loopback

##enp7s0 - management interfaces
auto enp7s0f0
iface enp7s0f0 inet manual
        mtu 9000
        ovs_mtu 8896

auto enp7s0f1
iface enp7s0f1 inet manual
        mtu 9000
        ovs_mtu 8896

auto bond0
iface bond0 inet manual
        ovs_bridge vmbr0
        ovs_type OVSBond
        ovs_bonds enp7s0f0 enp7s0f1
        ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
        ovs_options tag=1 vlan_mode=native-untagged trunks=5,500,501,502
        ovs_mtu 8896
        pre-up ifconfig enp7s0f0 mtu 9000
        pre-up ifconfig enp7s0f1 mtu 9000

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports bond0
        ovs_mtu 8896

auto vlan5
iface vlan5 inet static
        address 10.10.10.101/24
        gateway 10.10.10.1
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=5
        ovs_mtu 1500


auto vlan500
iface vlan500 inet static
        address 198.18.50.101/24
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=500
        ovs_mtu 1500

auto vlan501
iface vlan501 inet static
        address 198.18.51.101/24
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=501
        ovs_mtu 1500



##enp10s0 - ceph client interfaces
auto enp10s0f0
iface enp10s0f0 inet manual
        mtu 9000
        ovs_mtu 8896

auto enp10s0f1
iface enp10s0f1 inet manual
        mtu 9000
        ovs_mtu 8896

auto bond10
iface bond10 inet manual
        ovs_bridge vmbr10
        ovs_type OVSBond
        ovs_bonds enp10s0f0 enp10s0f1
        ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
        ovs_mtu 8896
        pre-up ifconfig enp10s0f0 mtu 9000
        pre-up ifconfig enp10s0f1 mtu 9000

auto vmbr10
iface vmbr10 inet manual
        ovs_type OVSBridge
        ovs_ports bond10
        ovs_mtu 8896

auto vlan800
iface vlan800 inet static
        address 198.18.80.101/24
        ovs_type OVSIntPort
        ovs_bridge vmbr10
        ovs_options tag=800
        ovs_mtu 8800

auto vlan850
iface vlan850 inet static
        address 198.18.85.101/24
        ovs_type OVSIntPort
        ovs_bridge vmbr10
        ovs_options tag=850
        ovs_mtu 1500

##enp11s0 - ceph backend interfaces
auto enp11s0f0
iface enp11s0f0 inet manual
        mtu 9000
        ovs_mtu 8896

auto enp11s0f1
iface enp11s0f1 inet manual
        mtu 9000
        ovs_mtu 8896

auto bond11
iface bond11 inet manual
        ovs_bridge vmbr11
        ovs_type OVSBond
        ovs_bonds enp11s0f0 enp11s0f1
        ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
        ovs_mtu 8896
        pre-up ifconfig enp11s0f0 mtu 9000
        pre-up ifconfig enp11s0f1 mtu 9000

auto vmbr11
iface vmbr11 inet manual
        ovs_type OVSBridge
        ovs_ports bond11
        ovs_mtu 8896

auto vlan900
iface vlan900 inet static
        address 198.18.90.101/24
        ovs_type OVSIntPort
        ovs_bridge vmbr10
        ovs_options tag=900
        ovs_mtu 8800

auto vlan950
iface vlan950 inet static
        address 198.18.95.101/24
        ovs_type OVSIntPort
        ovs_bridge vmbr10
        ovs_options tag=950
        ovs_mtu 1500
 
Thanks for the help.

I tested it, but no difference so instead I made 2 Ubuntu desktops on 2 of the servers, while running iperf3 -s instance on proxmoxs servers and getting the PC1 on PMX 2 to measure against the PMX3 Iperf3

1712323830819.png

The LAG interface, and utilization at 50%

1712324149600.png

And first test, from VM inside that server and then Iperf3 from the PMX terminal as root, to the other server. Very abnormal issue.

1712324397993.png


I have also tried to add the server into the cluster again, but this has failed:

'/etc/pve/nodes/pmx3/pve-ssl.pem' does not exist! (500)

Apr 05 17:58:24 pmx2 pmxcfs[47885]: [status] notice: update cluster info (cluster name SC-PMX, version = 2)
Apr 05 17:58:24 pmx2 pmxcfs[47885]: [status] notice: node lost quorum
Apr 05 17:58:24 pmx2 pmxcfs[47885]: [dcdb] crit: received write while not quorate - trigger resync
Apr 05 17:58:24 pmx2 pmxcfs[47885]: [dcdb] crit: leaving CPG group
Apr 05 17:58:25 pmx2 pmxcfs[47885]: [dcdb] notice: start cluster connection
Apr 05 17:58:25 pmx2 pmxcfs[47885]: [dcdb] crit: cpg_join failed: 14
Apr 05 17:58:25 pmx2 pmxcfs[47885]: [dcdb] crit: can't initialize service
Apr 05 17:58:27 pmx2 pmxcfs[47885]: [status] notice: node has quorum
Apr 05 17:58:29 pmx2 pmxcfs[47885]: [dcdb] crit: cpg_send_message failed: 9
Apr 05 17:58:29 pmx2 pmxcfs[47885]: [dcdb] crit: cpg_send_message failed: 9
Apr 05 17:58:29 pmx2 pmxcfs[47885]: [dcdb] crit: cpg_send_message failed: 9
Apr 05 17:58:29 pmx2 pmxcfs[47885]: [dcdb] crit: cpg_send_message failed: 9

Maybe need to add a new thread about the cluster problem

so long shot, but have you considered moving from linux bond to OVS (Open vSwitch) to see if the behavior changes? something in the back of my brain is tickling RE: 5.X to 6.x kernel changes...

im traveling so i can't copy/paste config, but IMHO its worth trying. overall OVS has been much more consistent for me ( and im running several CEPH N x 10GE setups like you) when dealing with LACP bundles, both in "fast" bond uptimes and in multi-vendor interop.

Just remember to "apt install openvswitch-switch" before doing this, or you'll end up frustrated when interfaces don't come up


* https://pve.proxmox.com/wiki/Open_vSwitch
 
Miserable, sorry you're having so much trouble!

I've been following the threads over on facebook as well, and while I can't nail down a specific spot (not knowing your network) in the past when I've had these types of issues, it's been one of three things:
1. MTU issue (as the others in the FB thread have said. Keep dropping the MTU back from the jumbo size - try 8k, then 4k, then 1500)
2. IP conflict -- this one is embarrassing to run into, but several times I've had hosts end up on the same IP address, either due to my faulty memory, or a DHCP server misconfigured, etc, and that can cause things to be strange. The cluster issue feels a bit like an IP conflict.
3. MAC conflict - because PMX generates MACs, I've actually ended up with duplicate MAC, because I cloned/moved/restored.. Just delete the MAC in the UI, let it regenerate, problem solved.

Good luck, hope you find the solution soon, sorry the OVS idea wasn't more help
 
Miserable, sorry you're having so much trouble!

I've been following the threads over on facebook as well, and while I can't nail down a specific spot (not knowing your network) in the past when I've had these types of issues, it's been one of three things:
1. MTU issue (as the others in the FB thread have said. Keep dropping the MTU back from the jumbo size - try 8k, then 4k, then 1500)
2. IP conflict -- this one is embarrassing to run into, but several times I've had hosts end up on the same IP address, either due to my faulty memory, or a DHCP server misconfigured, etc, and that can cause things to be strange. The cluster issue feels a bit like an IP conflict.
3. MAC conflict - because PMX generates MACs, I've actually ended up with duplicate MAC, because I cloned/moved/restored.. Just delete the MAC in the UI, let it regenerate, problem solved.

Good luck, hope you find the solution soon, sorry the OVS idea wasn't more help
Testing to use OVS for a while, so good suggestion, thanks for that.

As the test with 2 VM:s get the expected results then MTU for sure is proved to not be the problem (and was never a problem before) I am a network engineer so quite sure on this part, and IP-conflicts all is static and manual and all ok, as well as MAC-addresses checked.

Very strange issues indeed.

But maybe the cluster seemed to autoheal this time, so maybe 8.1.10 upgrade was the solution to that part.