bridge on bond doesn't come up after reboot (using SR-IOV VF's)

kayson

New Member
Feb 13, 2024
28
1
3
I have PVE 8.3 installed on a machine with a dual port SFP+ NIC (X710-DA2). I've set up SR-IOV VF's using these udev rules:
Code:
ACTION=="add", SUBSYSTEM=="net", ENV{INTERFACE}=="enp1s0f0np0", ATTR{device/sriov_numvfs}="3"
ACTION=="add", SUBSYSTEM=="net", ENV{INTERFACE}=="enp1s0f1np1", ATTR{device/sriov_numvfs}="3"

Using the PVE GUI, I've set up a bridge on a bond on two of the VF's resulting in the following interfaces file:
Code:
auto lo
iface lo inet loopback

auto enp1s0f0v0
iface enp1s0f0v0 inet manual

auto enp1s0f1v0
iface enp1s0f1v0 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves enp1s0f0v0 enp1s0f1v0
        bond-miimon 100
        bond-mode active-backup
        bond-primary enp1s0f0v0

auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        bridge-stp off
        bridge-fd 0

auto vmbr0.10
iface vmbr0.10 inet static
        address 10.7.0.20/24
        gateway 10.7.0.1

source /etc/network/interfaces.d/*

After applying the changes, the network works as expected. After a reboot, though, none of the interfaces come up:
Code:
> ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:9b:d6:84 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
3: enp1s0f1np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:9b:d6:85 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
4: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 8c:16:45:92:88:9b brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
5: enp1s0f0v0: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff
6: enp1s0f1v0: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff
7: enp1s0f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 86:62:db:16:bf:ef brd ff:ff:ff:ff:ff:ff
8: enp1s0f1v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether e6:93:a7:a5:17:b9 brd ff:ff:ff:ff:ff:ff
9: enp1s0f1v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether de:7d:9a:5c:a5:a3 brd ff:ff:ff:ff:ff:ff
10: enp1s0f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 96:cd:ab:44:2c:57 brd ff:ff:ff:ff:ff:ff
11: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue master vmbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff
12: vmbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff
13: vmbr0.10@vmbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff

and I see the following in dmesg:
Code:
[    6.891996] bond0: (slave enp1s0f0v0): Enslaving as a backup interface with a down link
[    6.895778] bond0: (slave enp1s0f1v0): Enslaving as a backup interface with a down link
[    6.911862] vmbr0: port 1(bond0) entered blocking state
[    6.911866] vmbr0: port 1(bond0) entered disabled state
[    6.911877] bond0: entered allmulticast mode

Strangely, the ifupdown2 debug logs show no errors and even show that it's running the commands to bring the VF's up:
Code:
2025-01-10 12:32:50,576: MainThread: ifupdown: scheduler.py:105:run_iface_op(): debug: bond0: pre-up : running module bond
2025-01-10 12:32:50,576: MainThread: ifupdown.bond: bond.py:697:get_ifla_bond_attr_from_user_config(): info: bond0: set bond-mode active-backup
2025-01-10 12:32:50,576: MainThread: ifupdown.bond: bond.py:697:get_ifla_bond_attr_from_user_config(): info: bond0: set bond-miimon 100
2025-01-10 12:32:50,577: MainThread: ifupdown.bond: bond.py:697:get_ifla_bond_attr_from_user_config(): info: bond0: set bond-primary enp1s0f0v0
2025-01-10 12:32:50,577: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:3130:link_add_bond_with_info_data(): info: bond0: netlink: ip link add dev bond0 type bond (with attributes)
2025-01-10 12:32:50,577: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:3134:link_add_bond_with_info_data(): debug: attributes: OrderedDict([(1, 1), (3, 100), (11, 5)])
2025-01-10 12:32:50,577: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:2744:link_set_master(): info: enp1s0f0v0: netlink: ip link set dev enp1s0f0v0 master bond0
2025-01-10 12:32:50,579: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:2610:link_up_force(): info: enp1s0f0v0: netlink: ip link set dev enp1s0f0v0 up
2025-01-10 12:32:50,581: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:2744:link_set_master(): info: enp1s0f1v0: netlink: ip link set dev enp1s0f1v0 master bond0
2025-01-10 12:32:50,585: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:2610:link_up_force(): info: enp1s0f1v0: netlink: ip link set dev enp1s0f1v0 up
(full logs are attached)

Running ifreload -a or ifup enp1s0f0v0 enp1s0f1v0 succeed but don't actually bring the interfaces up. If I do an ifdown enp1s0f0v0 enp1s0f1v0, first however, then either command successfully brings the interfaces up, and once everything is up, the network behaves as expected.

As an experiment, I tried some other network configurations. For some reason, removing the bridge and statically assigning an IP to the bond fixes the problem, and all interfaces come up after reboot:
Code:
auto lo
iface lo inet loopback

auto enp1s0f0v0
iface enp1s0f0v0 inet manual

auto enp1s0f1v0
iface enp1s0f1v0 inet manual

auto bond0
iface bond0 inet static
        bond-slaves enp1s0f0v0 enp1s0f1v0
        bond-miimon 100
        bond-mode active-backup
        bond-primary enp1s0f0v0
       address 10.1.0.20/24
       gateway 10.1.0.1

source /etc/network/interfaces.d/*
(the change in subnet is needed because of the switch VLAN config)

Code:
> ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:9b:d6:84 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 42:9d:5d:b1:61:e1 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
3: enp1s0f1np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:9b:d6:85 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 42:9d:5d:b1:61:e1 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
4: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 8c:16:45:92:88:9b brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
5: enp1s0f1v0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 42:9d:5d:b1:61:e1 brd ff:ff:ff:ff:ff:ff
6: enp1s0f0v0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 42:9d:5d:b1:61:e1 brd ff:ff:ff:ff:ff:ff
7: enp1s0f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c6:12:4a:95:c6:54 brd ff:ff:ff:ff:ff:ff
8: enp1s0f1v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether e6:8a:d1:a7:58:de brd ff:ff:ff:ff:ff:ff
9: enp1s0f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:ef:ad:81:8a:28 brd ff:ff:ff:ff:ff:ff
10: enp1s0f1v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 72:f5:80:f1:b6:35 brd ff:ff:ff:ff:ff:ff
11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 42:9d:5d:b1:61:e1 brd ff:ff:ff:ff:ff:ff

dmesg:
Code:
[    6.720277] bond0: (slave enp1s0f0v0): Enslaving as a backup interface with a down link
[    6.724151] bond0: (slave enp1s0f1v0): Enslaving as a backup interface with a down link
[    6.805535] iavf 0000:02:02.0 enp1s0f0v0: NIC Link is Up Speed is 10 Gbps Full Duplex
[    6.809171] iavf 0000:02:0a.0 enp1s0f1v0: NIC Link is Up Speed is 10 Gbps Full Duplex
[    6.831866] bond0: (slave enp1s0f0v0): link status definitely up, 10000 Mbps full duplex
[    6.831876] bond0: (slave enp1s0f1v0): link status definitely up, 10000 Mbps full duplex
[    6.831878] bond0: (slave enp1s0f0v0): making interface the new active one
[    6.831893] bond0: active interface up!

ifupdown2 debug logs for this case are also attached. I'm not seeing any notable differences.

Initially I suspected some kind of race condition with the VF's getting created and the systemd unit that calls ifupdown2 (networking.service), but adding sleep calls doesn't make a difference. This is not very surprising because there is an ordering dependency on another systemd unit (ifupdown2-pre.service) which calls udevadm settle first.

Finally, if I don't use VF's, and use the interfaces proper (i.e. PF's) for the bond, it works fine.

Does anyone have any ideas what's going on? Is there a more "correct" way to set up a network with VF's at boot?

Thanks!
 

Attachments

Bumping. Been bashing my head against this for several days now, so any help would be greatly appreciated!
 
I found this thread and wanted to share that I'm having a similiar issue. My configuration has 3 bonds defined, and whichever I specify highest in the /etc/network/interfaces file does not get created automatically, but the other bonds do. Once I'm on console, I can simply type ifup bondX and everything starts working.

I, too, am using Intel X710 if it matters.

Have you had any luck figuring it out @kayson ?

Code:
# pveversion -v
proxmox-ve: 8.3.0 (running kernel: 6.8.12-8-pve)
pve-manager: 8.3.4 (running version: 8.3.4/65224a0f9cd294a3)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
pve-kernel-5.4: 6.4-20
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.4.203-1-pve: 5.4.203-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.2
libpve-storage-perl: 8.3.3
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.3-1
proxmox-backup-file-restore: 3.3.3-1
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.6
pve-cluster: 8.0.10
pve-container: 5.2.4
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.4.0
pve-qemu-kvm: 9.0.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.8
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1
 
I was able to get my system working by making the following changes to /etc/network/interfaces

Code:
# original
auto enp1s0f0v0
iface enp1s0f0v0 inet manual

auto enp1s0f1v0
iface enp1s0f1v0 inet manual


to
Code:
# new
auto enp1s0f0v0
iface enp1s0f0v0 inet manual
        pre-up sleep 10 # Sleep 10 seconds

auto enp1s0f1v0
iface enp1s0f1v0 inet manual
        pre-up sleep 10 # Sleep 10 seconds
 
I was able to get my system working by making the following changes to /etc/network/interfaces

Code:
# original
auto enp1s0f0v0
iface enp1s0f0v0 inet manual

auto enp1s0f1v0
iface enp1s0f1v0 inet manual


to
Code:
# new
auto enp1s0f0v0
iface enp1s0f0v0 inet manual
        pre-up sleep 10 # Sleep 10 seconds

auto enp1s0f1v0
iface enp1s0f1v0 inet manual
        pre-up sleep 10 # Sleep 10 seconds
I had tried sleeps via systemd, but not ifconfig2. I'll have to give this a shot!
 
Aha! I've made some progress. It seems to be at least partly related to the mac addresses. If they're not specified explicitly, the driver picks them randomly. For some reason this causes issues (maybe because of how bonding assigns mac addresses?). I found a reference to this issue in the iavf driver docs: https://github.com/intel/ethernet-linux-iavf

If I explicitly assign hwaddresses in the interfaces file, everything boots up just fine. When I unplug one of the cables, though, to make the bond fail over, it goes down. I see a complaint about "Unprivileged VF 0 is attempting to configure promiscuous mode", but again, if I force an ifdown on the VF's, then ifreload, everything works again.

I'm starting to suspect some kind of race condition either in ifupdown2 or the iavf/i40e drivers.
 
Hey @kayson, Yes, I, too, have had some troubles related to the Mac address. Even after applying trust=on to the VF, I sometimes see this issue. I have not cracked the code either, but I am hopeful.
 
I -FINALLY- think I figured it out. It's related to the bridge-vids 2-4094 setting. Looks like when you add that, it creates a vlan filter on the underlying interface (the VF via the bond). For this particular driver, there's a limit of 16 vlans in the filter. ifupdown2 didn't complain about this until I installed the latest intel drivers, then I finally got an error message. Not sure if things only actually work with the latest drivers or not, but I'll test the stock drivers again later.
 
Just updated my ansible setup to fix that setting on a fresh install, and it seems to work just fine. Looks like you don't need the latest drivers (though having them means you get actually useful information out of ifupdown2).

I expect I'll run into some other issues once I start running VMs on the bridge, since they're currently run with random MACs and not trusted (so there's a limit on how many macs can be assigned to the VF). But one step at a time...
 
Hey @kayson , that's great to hear. If possible, could you share your network interfaces file or an example? I think we're very similar at this point.
Regarding the MACs, I've read conflicting information about handling this and haven't had a chance to dive deeper into testing. Oh, and thanks for the driver tip. I was debating whether I should update mine. I've done driver updates on other deployments and it's a hassle to maintain.
 
Hey @kayson , that's great to hear. If possible, could you share your network interfaces file or an example? I think we're very similar at this point.
Regarding the MACs, I've read conflicting information about handling this and haven't had a chance to dive deeper into testing. Oh, and thanks for the driver tip. I was debating whether I should update mine. I've done driver updates on other deployments and it's a hassle to maintain.
I'm using these udev rules and interfaces files (templated with ansible)

Code:
{% for nic, count in sr_iov_vfs.items() %}
ACTION=="add", SUBSYSTEM=="net", ENV{INTERFACE}=="{{nic}}", ATTR{device/sriov_numvfs}="{{count}}"
{% endfor %}

Code:
auto lo
iface lo inet loopback

auto enp1s0f0v0
iface enp1s0f0v0 inet manual

auto enp1s0f1v0
iface enp1s0f1v0 inet manual

auto enp1s0f0v1
iface enp1s0f0v1 inet manual

auto enp1s0f1v1
iface enp1s0f1v1 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves enp1s0f0v0 enp1s0f1v0
        bond-miimon 100
        bond-mode active-backup
        bond-primary enp1s0f0v0

auto bond1
iface bond1 inet static
        bond-slaves enp1s0f0v1 enp1s0f1v1
        bond-miimon 100
        bond-mode active-backup
        bond-primary enp1s0f1v1
        address 10.7.19.2{{ ansible_hostname[-1] }}/24

auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-16

auto vmbr0.10
iface vmbr0.10 inet static
        address 10.7.0.2{{ ansible_hostname[-1] }}/24
        gateway 10.7.0.1
#Management

source /etc/network/interfaces.d/*