Network with bonded interfaces freezes upon driving high traffic

rohitp

New Member
Jan 31, 2023
6
1
1
I have a 5 node Proxmox cluster and noticed when I run network heavy workloads on the node, I lose network connectivity resulting in all VM's encountering I/O error

I have multiple NIC's on each node. PVE server is accessible via 1G management NIC, however for the datapath (VM Network) I used a separate bridge with a bond of 2 x 100g interfaces from same NIC which seems to go unresponsive as I am unable to ping the default gateway

* Rebooting the server fixes the issue for time being and I run into the same issue again after resuming workloads (sometimes in few hours and sometimes in couple days)

Found a way to consistently repro this with just 5 x Windows 10 VM's, when the issue is encountered, LAG stays up

Bash:
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v5.15.104-1-pve

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP active: on
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535

Bash:
Slave Interface: ens9f0
MII Status: up
Speed: 100000 Mbps
Duplex: full

Slave Interface: ens9f1
MII Status: up
Speed: 100000 Mbps
Duplex: full


multipath is down, so VM disks on iSCSI LVM won't be reachable

Bash:
# multipath -ll
3624a93705c7d2fceb0c2448d0001146a dm-6 PURE,FlashArray
size=10T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 18:0:0:251 sdd 8:48  failed faulty running
  `- 19:0:0:251 sdc 8:32  failed faulty running
3624a93705c7d2fceb0c2448d0001146b dm-228 PURE,FlashArray
size=30T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 19:0:0:252 sde 8:64  failed faulty running
  `- 18:0:0:252 sdf 8:80  failed faulty running
3624a93705c7d2fceb0c2448d0001146c dm-5 PURE,FlashArray
size=20T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=enabled
  |- 19:0:0:253 sdg 8:96  failed faulty running
  `- 18:0:0:253 sdh 8:112 failed faulty running
3624a93705c7d2fceb0c2448d00011662 dm-229 PURE,FlashArray
size=20T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 18:0:0:254 sdj 8:144 failed faulty running
  `- 19:0:0:254 sdi 8:128 failed faulty running


Interfaces in question:

Bash:
vmbr1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet6 fe80::107c:caff:fe0d:7752  prefixlen 64  scopeid 0x20<link>
        ether 12:7c:ca:0d:77:52  txqueuelen 1000  (Ethernet)
        RX packets 1378379  bytes 21620140175 (20.1 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1196251  bytes 3196685338 (2.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
       
ens9f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000
        ether 12:7c:ca:0d:77:52  txqueuelen 1000  (Ethernet)
        RX packets 128403809  bytes 1071670603571 (998.0 GiB)
        RX errors 1  dropped 0  overruns 0  frame 1
        TX packets 91009854  bytes 750048282313 (698.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens9f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000
        ether 12:7c:ca:0d:77:52  txqueuelen 1000  (Ethernet)
        RX packets 58086495  bytes 491816629393 (458.0 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 96836586  bytes 801823835624 (746.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
       
       
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 9000
        ether 12:7c:ca:0d:77:52  txqueuelen 1000  (Ethernet)
        RX packets 186490304  bytes 1563487232964 (1.4 TiB)
        RX errors 1  dropped 63  overruns 0  frame 1
        TX packets 187846440  bytes 1551872117937 (1.4 TiB)
        TX errors 0  dropped 20 overruns 0  carrier 0  collisions 0


Network configurations:

Bash:
auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        mtu 9000

auto ens9f0
iface ens9f0 inet manual

auto ens9f1
iface ens9f1 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves ens9f0 ens9f1
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3
        mtu 9000

I have also upgraded PVE to latest version

Bash:
# pveversion
pve-manager/7.4-3/9002ab8a (running kernel: 5.15.104-1-pve)
 
Swapped the NIC with Mellanox ConnectX-6 and still seeing the same issue

Bash:
[  567.319189] mlx5_core 0000:4b:00.0: mlx5_wait_for_pages:736:(pid 1312): Skipping wait for vf pages stage
[  570.421106]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295030076, last ping 4295031360, now 4295032640
[  570.421113]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295030076, last ping 4295031360, now 4295032640
[  570.422160]  connection1:0: detected conn error (1022)
[  570.422713]  connection2:0: detected conn error (1022)
[  580.660914]  session2: session recovery timed out after 10 secs
[  580.660948] sd 19:0:0:253: rejecting I/O to offline device
[  580.661427] blk_update_request: I/O error, dev sdi, sector 8561987464 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[  580.661803] blk_update_request: I/O error, dev sdi, sector 8670983288 op 0x1:(WRITE) flags 0xca00 phys_seg 14 prio class 0
[  580.661808] device-mapper: multipath: 253:5: Failing path 8:128.
 
I have each port connected to different Arista Switches and got those switches configured in MLAG and configured channel-group on those ports to run LACP 803.2ad