Active-Backup interface keeps switching between the two links

casalicomputers

Renowned Member
Mar 14, 2015
89
3
73
Hello everyone,
Currently I've got my networking setup like this:
The machine running PVE has two NICs:
one is the default embedded NIC that came with the motherboard, a Broadcom NetXtreme Gigabit Ethernet (BCM5720), which is a RJ-45 dual-port, 1 Gigabit Ethernet Controller,
the other is an extra Intel(R) Ethernet Converged Network Adapter X710, this one is a dual port 10 GbEthernet SPF+ Controller.
One port of the SFP+ adapter I've attached directly to a NAS which I'm using for storage.
Other other I've attached to a switch to give Proxmox network access.
To not let the other NIC go to waste I've also created an LACP on both the pve and the switch with the intention of using it as a backup link in case there's an issue with the fiber cable or or with the card itself or to perform maintenance, basically just to have a little bit of redundancy.

I've done this kind of setup before, but so far I've always made bonds and then made the active-backup bond on the other LAG bonds.
This time the active backup bond is between an interface and another bond, and I thought that it would just work the same way,
and it kind of does, however it's not smooth a the other times I've done it.

Basically the issue is that the PVE keeps switching between the two links over and over, and every time it does that it causes the vmbr0 interface to go down and up and it creates a noticeable stutter effect for example when connected via ssh or rdp in a vm inside the pve or the pve itself.

Disabling the ethernet ports on the switch fixed the issue, but that goes against what I'm trying to achieve, which is automatic failover between the 10GbE link and the 1GbE LACP.

Here is my network config, is there any issue with it?
Code:
root@pve01:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto eno8303
iface eno8303 inet manual
#ETH-P1

auto eno8403
iface eno8403 inet manual
#ETH-P2

auto ens1f0
iface ens1f0 inet manual
#LINK-SW01

iface ens1f1 inet manual
#LINK-NAS

auto bond0
iface bond0 inet manual
        bond-slaves eno8303 eno8403
        bond-miimon 100
        bond-mode 802.3ad
#LACP-ETH

auto bond1
iface bond1 inet manual
        bond-slaves bond0 ens1f0
        bond-miimon 100
        bond-mode active-backup
        bond-primary ens1f0
#SFP-ETH-FAILOVER

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.30/24
        gateway 192.168.1.1
        bridge-ports bond1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#INTF-MAIN

auto vmbr1
iface vmbr1 inet static
        address 10.0.98.30/24
        bridge-ports ens1f1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#INTF-NAS

auto vmbr0.99
iface vmbr0.99 inet static
        address 10.0.99.30/24
#INTF-MGMT

This is a snippet of the output of dmesg, it kept throwing this warnings, and that's what made me turn off the ports on the switch, however I can't really make sense of it,
Do I need to configure something also on the switch? Usually I just make the LACP and it's proxmox that controls whats the active interface it uses...
Code:
[   17.302604] i40e 0000:01:00.0: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[   17.302811] i40e 0000:01:00.0: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on
[   17.303696] i40e 0000:01:00.0: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[   17.303904] i40e 0000:01:00.0: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on
[   17.304801] i40e 0000:01:00.0: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[   17.305008] i40e 0000:01:00.0: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on
[   17.305617] vmbr1: port 1(ens1f1) entered blocking state
[   17.305621] vmbr1: port 1(ens1f1) entered forwarding state
[   17.305908] i40e 0000:01:00.0: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[   17.306116] i40e 0000:01:00.0: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on
[   17.528735] vmbr0: port 1(bond1) entered blocking state
[   17.528739] vmbr0: port 1(bond1) entered forwarding state
 
Last edited:
Update:
after a bit of research I think that the error in the dmesg logs has little if anything to do with the issue I described.
Thats related to a bug (?) of the i40e driver, as I understand it: at first the interface tries to filter the vlan tag of the packets instead of just passing it on to the bridge, then it sort of gives up and enters promiscuous mode.

I tried the ethtool configurations suggested in this thread
https://forum.proxmox.com/threads/error-i40e_aq_rc_enospc-forcing-overflow-promiscuous-on-pf.62875/
However nothing worked, the error is still there, but it happens only once when the interface goes up.

It was appearing so much in the logs because the interface kept going up and down.
When you make a bond in the PVE GUI and you select the interfaces on which to build the bond, the PVE sorts them in the network configuration in alphabetical order, In my case the main interface was the second one,
and whenever the networking service did its thing it saw bond0 instead of ens1f0 first and tried to raise that one up,
only to see later that the bond-primary was ens1f0 and bring that one up instead.

At least that's how I explain it to myself because what solved my problem was just manually editing this bit of the configuration:

Code:
auto bond1
iface bond1 inet manual
        bond-slaves bond0 ens1f0
        bond-miimon 100
        bond-mode active-backup
        bond-primary ens1f0
#SFP-ETH-FAILOVER

to:

Code:
auto bond1
iface bond1 inet manual
        bond-slaves ens1f0 bond0
        bond-miimon 100
        bond-mode active-backup
        bond-primary ens1f0
#SFP-ETH-FAILOVER
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!