PVE9 unable to share VLAN and Bridge on same port vs PVE8

PwrBank

Active Member
Nov 12, 2024
151
60
28
Hey all,

I have my interfaces setup as follows:
1756383562795.png

This allows me to use the two 25GbE interfaces for both iSCSI (tagged on VLAN 254 most of the time, but in this instance it's VLAN1) and br0 for VMs, where the VM NIC will be tagged for the appropriate VLAN.

This setup works perfect in PVE8 and is very stable.

However, in PVE9, it complains about the Linux VLAN and bond using the same interface and will not refresh the interfaces.
Code:
bond0 : error: bond0: sub interfaces are not allowed on bond slave: eth25p0 (scsi0)
TASK ERROR: command 'ifreload -a' failed: exit code 1

When running ifup -v -a
Code:
info: requesting link dump
info: requesting address dump
info: requesting netconf dump
info: loading builtin modules from ['/usr/share/ifupdown2/addons']
info: module openvswitch not loaded (module init failed: no /usr/bin/ovs-vsctl found)
info: module openvswitch_port not loaded (module init failed: no /usr/bin/ovs-vsctl found)
info: module ppp not loaded (module init failed: no /usr/bin/pon found)
info: module batman_adv not loaded (module init failed: no /usr/sbin/batctl found)
info: executing /sbin/sysctl net.bridge.bridge-allow-multiple-vlans
info: module mstpctl not loaded (module init failed: no /sbin/mstpctl found)
info: executing /bin/ip rule show
info: executing /bin/ip -6 rule show
info: address: using default mtu 1500
info: address: max_mtu undefined
info: executing /sbin/sysctl net.ipv6.conf.all.accept_ra
info: executing /sbin/sysctl net.ipv6.conf.all.autoconf
info: executing /usr/sbin/ip vrf id
info: mgmt vrf_context = False
info: executing /bin/ip addr help
info: address metric support: OK
info: module ppp not loaded (module init failed: no /usr/bin/pon found)
info: module mstpctl not loaded (module init failed: no /sbin/mstpctl found)
info: module batman_adv not loaded (module init failed: no /usr/sbin/batctl found)
info: module openvswitch_port not loaded (module init failed: no /usr/bin/ovs-vsctl found)
info: module openvswitch not loaded (module init failed: no /usr/bin/ovs-vsctl found)
info: looking for user scripts under /etc/network
info: loading scripts under /etc/network/if-pre-up.d ...
info: loading scripts under /etc/network/if-up.d ...
info: loading scripts under /etc/network/if-post-up.d ...
info: loading scripts under /etc/network/if-pre-down.d ...
info: loading scripts under /etc/network/if-down.d ...
info: loading scripts under /etc/network/if-post-down.d ...
info: 'link_master_slave' is set. slave admin state changes will be delayed till the masters admin state change.
info: using mgmt iface default prefix eth
info: processing interfaces file /etc/network/interfaces
info: processing interfaces file /etc/network/interfaces.d/sdn
info: lo: running ops ...
info: executing /sbin/sysctl net.mpls.conf.lo.input=0
info: executing /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/chrony
info: eth1p0: running ops ...
info: vmbr0: applying bridge port configuration: ['eth1p0']
info: vrf: syncing table map to /etc/iproute2/rt_tables.d/ifupdown2_vrf_map.conf
info: vrf: dumping iproute2_vrf_map
info: {}
info: executing /sbin/sysctl net.mpls.conf.eth1p0.input=0
info: executing /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/chrony
info: vmbr0: running ops ...
info: vmbr0: bridge already exists
info: vmbr0: applying bridge settings
info: vmbr0: reset bridge-hashel to default: 4
info: reading '/sys/class/net/vmbr0/bridge/stp_state'
info: vmbr0: netlink: ip link set dev vmbr0 type bridge (with attributes)
info: vmbr0: port eth1p0: already processed
info: vmbr0: applying bridge configuration specific to ports
info: vmbr0: processing bridge config for port eth1p0
info: bridge mac is already inherited from eth1p0
info: executing /sbin/sysctl net.mpls.conf.vmbr0.input=0
info: writing '0' to file /proc/sys/net/ipv4/conf/vmbr0/arp_accept
info: executing /bin/ip route replace default via 10.40.110.1 proto kernel dev vmbr0 onlink
info: executing /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/chrony
info: eth25p1: running ops ...
info: executing /sbin/sysctl net.mpls.conf.eth25p1.input=0
info: executing /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/chrony
info: eth25p0: running ops ...
info: executing /sbin/sysctl net.mpls.conf.eth25p0.input=0
info: executing /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/chrony
info: bond0: running ops ...
info: bond0: already exists, no change detected
error: bond0: sub interfaces are not allowed on bond slave: eth25p0 (scsi0)
info: br0: applying bridge port configuration: ['bond0']
info: executing /sbin/sysctl net.mpls.conf.bond0.input=0
info: executing /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/chrony
info: br0: running ops ...
info: br0: bridge already exists
info: br0: applying bridge settings
info: br0: reset bridge-hashel to default: 4
info: reading '/sys/class/net/br0/bridge/stp_state'
info: br0: netlink: ip link set dev br0 type bridge (with attributes)
info: br0: port bond0: already processed
info: br0: applying bridge configuration specific to ports
info: br0: processing bridge config for port bond0
info: bridge mac is already inherited from bond0
info: executing /sbin/sysctl net.mpls.conf.br0.input=0
info: br0: bridge inherits mtu from its ports. There is no need to assign mtu on a bridge
info: executing /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/chrony
info: scsi1: running ops ...
info: executing ifconfig scsi1 hw ether BA:0A:C1:0E:7A:BD
info: executing /sbin/sysctl net.mpls.conf.scsi1.input=0
info: scsi1: netlink: ip link set dev scsi1 up
info: executing /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/chrony
info: scsi0: running ops ...
info: executing /sbin/sysctl net.mpls.conf.scsi0.input=0
info: scsi0: netlink: ip link set dev scsi0 up
info: executing /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/chrony
info: exit status 1 in 0:00:00.241194

I've also attached the output of ifup -d -a to the post in case that's helpful

This is the current interfaces config
Code:
auto lo
iface lo inet loopback

iface eth1p0 inet manual

iface eth1p1 inet manual

iface eth100p0 inet manual

iface eth100p1 inet manual

auto eth25p0
iface eth25p0 inet manual
        mtu 9000

auto eth25p1
iface eth25p1 inet manual
        mtu 9000

auto bond0
iface bond0 inet manual
        bond-slaves eth25p0 eth25p1
        bond-miimon 100
        bond-mode balance-xor
        bond-xmit-hash-policy layer2
        mtu 9000

auto vmbr0
iface vmbr0 inet static
        address 10.40.110.81/24
        gateway 10.40.110.1
        bridge-ports eth1p0
        bridge-stp off
        bridge-fd 0

auto br0
iface br0 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-254
        mtu 9000

auto scsi0
iface scsi0 inet static
        address 10.40.254.61/24
        mtu 9000
        vlan-id 1
        vlan-raw-device eth25p0
        pre-up ifconfig scsi1 hw ether BA:FB:FB:FA:D2:9B
#iSCSI

auto scsi1
iface scsi1 inet static
        address 10.40.254.62/24
        mtu 9000
        vlan-id 1
        vlan-raw-device eth25p1
        pre-up ifconfig scsi1 hw ether BA:0A:C1:0E:7A:BD
#iSCSI

Is this an intended change, a bug, or is there a different way I could achieve this setup?
 

Attachments

Are those two interfaces (eth25p0 eth25p1) connected to the same switch or two switches with MLAG? I see some general problems with this setup, but it also depends on that factor before I can say more.
 
Are those two interfaces (eth25p0 eth25p1) connected to the same switch or two switches with MLAG? I see some general problems with this setup, but it also depends on that factor before I can say more.

They are connected to two switches that are MLAG'd together. Each interface is plugged into Port 5 on switch 1 and switch 2. Both are trunks with access to the VLANs necessary for the iSCSI and other traffic to function. However, Port 5 is not set as an MLAG. There are Mellanox switches in a MLAG themselves, communicating to each other about what each is doing.

scsi0 and scsi1 are Linux VLAN interfaces with a manually set MAC address in order to use multipath iSCSI on VLAN 1 (or whichever VLAN the iSCSI target is on).
 
How does the setup from the other side look like (network cards + IP config + switch config)?

In any case, you should at least use two disjunct subnets for the different paths, due to how the Linux networking stack works (see also [1]). If you have the same subnet configured twice across different interfaces, then basically only the first interface with that subnet is used (the first one to show up in the routing table, which is usually the first in the configuration). Depending on how the setup looks like on the other side, you also need to be really careful with ARP, since Linux answers ARP requests on all configured interfaces (see weak host model [2]), even if the IP address is configured on another interface.

So, the traffic might not be going over the interfaces you think it does, when configuring your network like you did in the initial post.

It seems like you also have a typo when overriding the MAC address (you're setting the MAC address of scsi1 twice), so one of the SCSI interfaces has the same MAC as the bond (which should actually be fine since MAC addresses only need to be unique inside a L2 domain, but I thought I'd point it out).


In your case it should imo just be fine to configure the VLAN for storage on the bond as well and use that for redundancy.

[1] https://www.truenas.com/community/resources/multiple-network-interfaces-on-a-single-subnet.45/
[2] https://en.wikipedia.org/wiki/Host_model
 
How does the setup from the other side look like (network cards + IP config + switch config)?

In any case, you should at least use two disjunct subnets for the different paths, due to how the Linux networking stack works (see also [1]). If you have the same subnet configured twice across different interfaces, then basically only the first interface with that subnet is used (the first one to show up in the routing table, which is usually the first in the configuration). Depending on how the setup looks like on the other side, you also need to be really careful with ARP, since Linux answers ARP requests on all configured interfaces (see weak host model [2]), even if the IP address is configured on another interface.

So, the traffic might not be going over the interfaces you think it does, when configuring your network like you did in the initial post.

It seems like you also have a typo when overriding the MAC address (you're setting the MAC address of scsi1 twice), so one of the SCSI interfaces has the same MAC as the bond (which should actually be fine since MAC addresses only need to be unique inside a L2 domain, but I thought I'd point it out).


In your case it should imo just be fine to configure the VLAN for storage on the bond as well and use that for redundancy.

[1] https://www.truenas.com/community/resources/multiple-network-interfaces-on-a-single-subnet.45/
[2] https://en.wikipedia.org/wiki/Host_model

1. I guess I'm not understanding what you would like to see when it comes to network cards and what not. Could you give an example of what you're looking for?

2. Yeah, I had a discussion with @fweber about this here about the multiple subnets thing when it came to multipathing. The idea was to replicate how it's done in ESXi. While it's not the default behavior for Linux, it seems like it handles it just fine with the tweaks needed in that post.

3. I caught the typo after I posted - It's corrected in the actual config already
 
Ah, I remember talking to him about your post, we looked at that setup back then together.

It seems like you went without the VRFs? In that case I'd strongly suggest using different VLANs / Subnets. In any case, the only way I see with the current ifupdown2 version is to set up the VLAN interfaces manually via post-up I'm afraid. It's a bit of an odd setup, particularly without VRFs (since you have the same subnet on two interfaces). We'll see if we want to patch ifupdown2, since we usually don't like patching upstream unless there is a bigger amount people running into issues with the current version.
 
It seems like you went without the VRFs?

I think in the end I ended up not using VRFs. It works perfect as expected though, it's been in semi-prod for about 5 months. It gets the full 6.2GB/s thru the 2x25GbE NIC to the Pure array with multipathing.

It's a bit of an odd setup, particularly without VRFs (since you have the same subnet on two interfaces). We'll see if we want to patch ifupdown2, since we usually don't like patching upstream unless there is a bigger amount people running into issues with the current version.

I have a feeling more people are going to ask, I've already seen a couple posts on Reddit about it. I'd link the ones I'm thinking about, but Reddit isn't loading for whatever reason at the moment.
https://www.reddit.com/r/Proxmox/comments/1mr3kan/comment/n8vfwvr/

AFAIK, this is how it's expected to work in ESXi and with lots of people moving over, I can see this coming up more and more. This is actually replicating the exact setup Pure instructed us to use in ESXi, but in Proxmox.
 
Last edited:
Ah, I remember talking to him about your post, we looked at that setup back then together.

It seems like you went without the VRFs? In that case I'd strongly suggest using different VLANs / Subnets. In any case, the only way I see with the current ifupdown2 version is to set up the VLAN interfaces manually via post-up I'm afraid. It's a bit of an odd setup, particularly without VRFs (since you have the same subnet on two interfaces). We'll see if we want to patch ifupdown2, since we usually don't like patching upstream unless there is a bigger amount people running into issues with the current version.

Hey @shanreich
So I've been installing PVE8 in prod in order to get around this issue. Eventually we will be getting 100GbE networking, which will allow us to use dedicated interfaces for the iSCSI. However, there is a cluster that does not have room for adding another set of NICs for dedicated storage.

Do you think this is something that can be patched up stream? I can understand if ifupdown2 is explicitly programmed to prevent you doing this, encase of some edge case issue, but is it possible this is just an edge case that it wasn't written to handle?