Bond & Bridge Interfaces - Undesired Behavior

silverstone

Well-Known Member
Apr 28, 2018
174
24
58
36
I started migrating from pure Bridge Network Setup towards Bond based Network Setup on several of my Hosts now, after I accidentally created a Network Loop by plugging in multiple NICs (physical Interface connected to vmbr0 in Proxmox Network Configuration) into a Physical Switch, all while trying to reduce Risk of Network Shutdown should a single NIC fail.

By default STP is set to off for Bridge Interfaces in Proxmox VE, so of course that created a massive Network Loop which almost took down my entire Homelab o_O .

However, I am getting a very bad Behavior with Bonds, in case a physical Interface doesn't exist, particularly related to the bond-primary Setting (which NEEDS to be set via the Proxmox VE GUI, i.e. leaving it unset/empty is not acceptable to Proxmox VE).

In fact I will be locked out of the Server if the bond-primary Setting is set to an Interface which doesn't exist ! This completely negates the Point of creating a (supposedely) redundant Link in my Opinion. I know that Bonds can also be used for LAGG and increasing Bandwidth, but for my quite limited Needs, I think that active-backup is plenty.

Sure, you can blame it on the Sysadmin Configuration Error, but I can totally see how this could play in other Situations as well:
  • Kernel Upgrade leading to Network Interfaces changing Names, as it already happened in the past, particularly when upgrading to a new Debian Release (e.g. Bullseye -> Bookworm) and NOT using something like /etc/udev/rules.d/30-net_persistent_names.rules to avoid the "new" Predictable Network Interface Names
  • NIC Failed (broken Hardware)
  • NIC Firmware compatibility Issue with newer Kernel
  • ...
The old and new Configurations are reported here below.

Since the Mellanox ConnectX-2/ConnectX-4/ConnectX/4 LX as well as the Intel X710/XXV710 NICs only support a (relatively) small Number of VLANs, some additional Configuration is also required. In order to avoid breaking the Proxmox VE GUI, I found it better to put these manual Configurations in a separate File under /etc/network/interfaces.d/vmbr0 and (for the Linux Bond Setup) /etc/network/interfaces.d/bond0.

Previous Configuration (using only Linux Bridge)

/etc/network/interfaces
Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto enp1s0f0np0
iface enp1s0f0np0 inet manual

auto enp1s0f1np1
iface enp1s0f1np1 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.2.9/20
    gateway 192.168.1.1
    bridge-ports eno1 eno2 enp1s0f0np0 enp1s0f1np1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
#MAIN_BRIDGE

iface vmbr0 inet6 static
    address 2XXX:XXXX:XXXX:0001:0000:0000:0002:0009/64
    gateway 2XXX:XXXX:XXXX:0001:0000:0000:0001:0001

source /etc/network/interfaces.d/*

/etc/network/interfaces.d/vmbr0
Code:
auto vmbr0
iface vmbr0 inet static
  # Enable VLANs
  bridge-vlan-aware yes

  # Mellanox ConnectX2/3/... are limited to only 128 VLAN IDs (effectively the Error occurs above 125 VLAN IDs)
  bridge-vids 1 100-110 1000

Current/future Configuration (using Linux Bond + Linux Bridge)

/etc/network/interfaces

Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto enp1s0f0np0
iface enp1s0f0np0 inet manual

auto enp1s0f1np1
iface enp1s0f1np1 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves eno1 eno2 enp1s0f0np0 enp1s0f1np1
    bond-miimon 100
    bond-mode active-backup

    # If enp1s0f0np0 doesn't exist, I will be locked out of the Server !
    bond-primary enp1s0f0np0
#MAIN_BOND

auto vmbr0
iface vmbr0 inet static
    address 192.168.2.9/20
    gateway 192.168.1.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
#MAIN_BRIDGE

iface vmbr0 inet6 static
    address 2XXX:XXXX:XXXX:0001:0000:0000:0002:0009/64
    gateway 2XXX:XXXX:XXXX:0001:0000:0000:0001:0001

source /etc/network/interfaces.d/*

/etc/network/interfaces.d/vmbr0

Code:
[CODE]auto vmbr0
iface vmbr0 inet static
    # Enable VLANs
    bridge-vlan-aware yes

    # Mellanox ConnectX-2/ConnectX-3/ConnectX-4 LX & Intel X710/XXV710 are limited to only 128 VLAN IDs (effectively the error occurs above 125 VLAN IDs)
    bridge-vids 1 100-110 1000
[/CODE]

/etc/network/interfaces.d/bond0

Code:
auto bond0
iface bond0 inet manual
    # Configure Bond Priorities
    # Higher priority is better. Values are up to 32 Bit Signed Integer.
    post-up ip link set dev eno1 type bond_slave prio 1000
    post-up ip link set dev eno2 type bond_slave prio 500
    post-up ip link set dev enp1s0f0np0 type bond_slave prio 10000
    post-up ip link set dev enp1s0f1np1 type bond_slave prio 10000

    # Enable VLANs
    bridge-vlan-aware yes

    # Mellanox ConnectX-2/ConnectX-3/ConnectX-4 LX & Intel X710/XXV710 are limited to only 128 VLAN IDs (effectively the error occurs above 125 VLAN IDs)
    bridge-vids 1 100-110 500-510 1000-1010

Note that I'm not 100% sure about setting the VLAN IDs for the Bond Interface.

If I did NOT do that, then all 4096 VLANs would show up in the output of bridge vlan show (under bond0 Interface), which I believed could cause Issue with this Number of VLAN IDs Limitation mentioned above for several NICs.

With bond0 VLANs configured:
Code:
port              vlan-id 
vmbr0             1 PVID Egress Untagged
bond0             1 PVID Egress Untagged
                  100
                  101
                  102
                  103
                  104
                  105
                  106
                  107
                  108
                  109
                  110
                  1000

Without bond0 VLANs configured:
Code:
port              vlan-id 
vmbr0             1 PVID Egress Untagged
bond0             1 PVID Egress Untagged
                  2
                  3
                  4
                  5
                  6
                  7
                  8
                  9
                  10
                  11
                  12
                  13
                  14
                  15
                  16
                  17
                  18
                  19
                  20
                  ...
                  ...
                  ...

I don't see any Error in dmesg at the Moment but that's also because I'm using eno1 Interface (1gbps) which I think has no Issue with 4096 VLAN IDs.

Pretty sure it would give a lot of Errors with the 10gbps NICs mentioned above.
 
Using an Interface which does NOT exist for bond-primary Setting (enp1s0f0np0)

Use ifdown bond0 to actually trigger the Issue: if the Bond already exists when the Configuration is reloaded, then the Bond will still exists afterwards.

Test reloading Configuration using ifreload -avs
1767273044309.png

Reload Configuration using ifreload -av
1767273054066.png

Using an Interface which exists for bond-primary Setting (eno1)

Test reloading Configuration using ifreload -avs
1767273063026.png

Reload Configuration using ifreload -av
1767273074595.png
 
Unsure if they bring much more Information, but see the full Text Logs attached using Redirection of stdout + stderr to the respective File.
 

Attachments

Is there one / several Configuration Errors on my Side ?

If not, what is the best Way to solve this ? Create a dummy Network Interface that is always present and set that as Default (even though it makes no sense) ?

EDIT 1: the following is a bit of a Hack but it works ;)

Basically create a dummy Virtual Network Interface dummy0, setting it as bond-primary, then bringing the Dummy Interface dummy0 down, letting one of the real Interfaces take over :)

Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto enp1s0f0np0
iface enp1s0f0np0 inet manual

auto enp1s0f1np1
iface enp1s0f1np1 inet manual

auto dummy0
iface dummy0 inet manual
    pre-up ip link add dummy0 type dummy

auto bond0
iface bond0 inet manual
    pre-up ifup dummy0
    bond-slaves eno1 eno2 enp1s0f0np0 enp1s0f1np1 dummy0
    bond-miimon 100
    bond-mode active-backup
    bond-primary dummy0     
         post-up ip link set down dev dummy0
#MAIN_BOND

auto vmbr0
iface vmbr0 inet static
    address 192.168.2.9/20
    gateway 192.168.1.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
#MAIN_BRIDGE

iface vmbr0 inet6 static
    address 2XXX:XXXX:XXXX:0001:0000:0000:0002:0009/64
    gateway 2XXX:XXXX:XXXX:0001:0000:0000:0001:0001

source /etc/network/interfaces.d/*

I wonder if there is a better Way though ?

Like a Tunable to disable the Enforcement of bond-primary checking that the Physical Interface exists ...

EDIT 2: with the dummy0 Interface, I cannot seem to do any Change in the GUI anyways :(

Message:
Code:
bond 'bond0' - wrong interface type on slave 'dummy0' ('unknown' != 'eth or bond') (500)

1767276475236.png
 
Last edited:
lets back way up.

1. you have 4 physical interfaces. what are the physically connected to?
2. describe your vlan plan, and which physical interfaces you want to have those vlans travel over
3. describe what traffic you want to use the vlans for.

I can help you create an interfaces file to properly serve the above requirements.
 
lets back way up.
Is it really necessary :p ? I feel most of your Comments are about VLAN, whereas the Problem I described is about the Bond (and potentially Bridge) Interface.


1. you have 4 physical interfaces. what are the physically connected to?
1. Right now pretty much only one Physical Interface is connected to a Managed L2 Switch. What I provided is the Configuration of one Proxmox VE Host, but it's fairly representative across the Board with my other Servers.

Depending on the specific Configuration, the Physical Interface(s) is/are connected to one of the following managed Switches:
  • Mikrotik CRS309 - Mikrotik RouterOS
  • Mikrotik CRS317 - Mikrotik RouterOS
  • Mikrotik CRS326 - Mikrotik RouterOS
  • ONTI ONT-S508CL-8S - OpenWRT
  • Zyxel GS1900-24(E) - OpenWRT
  • HP 1920-24G - OpenWRT
  • Mellanox SX6036 (still to be deployed)
All of these Switches are managed L2 Switches. Some might have some/full L3 Capabilities, but for now let's just stay on L2 :).

2. describe your vlan plan, and which physical interfaces you want to have those vlans travel over

2. For now (Homelab is a Work in Progress since a long Time) there is typically just a need for (at least) one Physical Interface for Production. Once I get into STP using 2 Paths with 2 separate Switches etc (in order to prevent single Points of Failure), the other Interfaces will likely come into play.

Real World isn't perfect so sometimes these Servers need to get moved around or a NIC misbehaves, then I try to fix it by plugging it into another Switch just to realize that I created a Network Loop in the Process. That's the main Objective of setting up Bond Interfaces, at least for now.

Right now VLAN Deployment is going very slowly (most of my Homelab is on the Default VLAN / untagged), but basically I want Proxmox VE to just assign some Virtual Interfaces to the Guest VM / CT using a Syntax like vmbr0.100 to assign VLAN 100 to that Guest Network Interface.

I'm pretty sure I tried this already and it worked, but you know, Priorities ?

Right now pretty much:
  • Everything is on VLAN 1 (default / untagged VLAN): 192.168.0.0/20
  • I started setting up VLAN 150 (tagged) for Switches Management: 192.168.148.0/22
  • I set up VLAN 3900 (tagged) for OPNSense PFSYNC High-Availability Synchronization Interface
  • I wrote a somewhat defined Spreadsheet Table of what I want(ed) to do but didn't manage to do most of it yet. There are planned VLANs for Internal/Trusted WiFi, IOT WiFi, NOT WiFi, Guest WiFi, CCTV System(s), Zigbee2MQTT Bridge Networks, etc

3. describe what traffic you want to use the vlans for.
Unsure about the Question. What do you mean exactly ? It depends on the Application.


I can help you create an interfaces file to properly serve the above requirements.
Ideally I'd like to keep it quite general and do the required customization on a CT/VM basis, since I have quite a few Servers to manage and jumping between 300 different Configurations is going to turn into a Disaster for sure :).

Once again I feel that the Parameter bond-primary is completely messing up everything.

How is it possible that a Redundant System is bought to its knees by requiring an Interface to be setup as Primary and, if that Interface disappears/misbehaves/breaks down at the next Reboot, you end up getting locked out of your Server ?
 
Is it really necessary
Not at all. my (and everyone else's) participation in this forum is voluntary. Nothing you provide (or not) is necessary as long as you dont expect anything in return.

  • Mikrotik CRS309 - Mikrotik RouterOS
  • Mikrotik CRS317 - Mikrotik RouterOS
  • Mikrotik CRS326 - Mikrotik RouterOS
  • ONTI ONT-S508CL-8S - OpenWRT
  • Zyxel GS1900-24(E) - OpenWRT
  • HP 1920-24G - OpenWRT
  • Mellanox SX6036 (still to be deployed)
you have 4 ports. how are you attaching them to 7 different devices? More importantly, are they all connected to each other, and if so- are the uplinks all trunk ports?

next- why are you trying to use all your interfaces in the same lagg? if they are not all the same speed you shouldnt do that in the first place- but without LACP you will only have one link active anyway. Here is how I would design the network in your situation:

Code:
auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto enp1s0f0np0
iface enp1s0f0np0 inet manual

auto enp1s0f1np1
iface enp1s0f1np1 inet manual

#10g bond
auto bond0
iface bond0 inet manual
    bond-slaves enp1s0f0np0 enp1s0f1np1
    bond-miimon 100
    bond-mode active-backup # LACP is far preferred and is supported by your mikrotik switches.

# 1g bond
auto bond1
iface bond1 inet manual
    bond-slaves eno1 eno2
    bond-miimon 100
    bond-mode active-backup

# management network
auto bond1.150
iface bond1.150 inet static
    address 192.168.148.xx/24

# vmbr0 using bond0
# vmbr1 using bond1
How is it possible that a Redundant System is bought to its knees by requiring an Interface to be setup as Primary and, if that Interface disappears/misbehaves/breaks down at the next Reboot, you end up getting locked out of your Server ?
typo, most like. double check the names of your interfaces.