[SOLVED] LACP bond issues with native + tagged VLANs

CrazyIvan359

New Member
Jul 7, 2022
12
3
3
Hello hive mind, I've hit the limit of my google-foo and I'm in need of some assistance

I've got Proxmox 7.2 running on a R730, it was running on a single NIC for a while without issue but I recently had some time to work on networking a bit more and I've setup a second connection and teamed them, which has caused some issues. I have successfully setup LACP with trunking on this switch for my pfSense box without issue and the setup for the server is the same so I'm pretty sure this is a Proxmox configuration issue. Relevant sections of /etc/network/interfaces are shown below for reference.

Switch is setup with native VLAN 10 and tagged VLANs 5,20,30,40,50 on the ports for the server. Single or LACP the interfaces come up without issue on both ends regardless of configuraton.

With the single NIC I have access to the host on the untagged VLAN 10 and I have VMs connected on tagged VLANs 5 and 20. In this configuration I do not have bridge-vlan-aware enabled.

With the interfaces bonded but no other changes I lose the ability to communicate over the tagged VLANs used by the VMs. I am still able to communicate over the untagged native VLAN to the host and a VM that is also using the native VLAN.

If I enable bridge-vlan-aware and setup interfaces for the tagged VLANs on the bridge I lose connectivity to the host and VMs.

What am I missing here? Why does a configuration that works on a single physical interface not work on a bonded logical interface? Additionally, why does everything break if I enable bridge-vlan-aware?

This works with the native/untagged VLAN and I'm able to use tagging on VM NICs
Code:
iface eno3 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.10.210/24
    gateway 192.168.10.1
    bridge-ports eno3
    bridge-stp off
    bridge-fd 0
Only the native/untagged vlan works when bonded. VM NICs with VLAN tags are unable to connect
Code:
iface eno3 inet manual

iface eno4 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves eno3 eno4
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3

auto vmbr0
iface vmbr0 inet static
    address 192.168.10.210/24
    gateway 192.168.10.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
No connectivity to host with this setup
Code:
iface eno3 inet manual

iface eno4 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves eno3 eno4
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3

auto vmbr0
iface vmbr0 inet static
    address 192.168.10.210/24
    gateway 192.168.10.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-pvid 10
    bridge-vids 5,20,30,40,50

auto vmbr0.5
iface vmbr0.5 inet manual

auto vmbr0.20
iface vmbr0.20 inet manual

auto vmbr0.30
iface vmbr0.30 inet manual

auto vmbr0.40
iface vmbr0.40 inet manual

auto vmbr0.50
iface vmbr0.50 inet manual
 
First of all, the tagged interfaces are completely irrelevant for VM communication. If your host doesn't need an address in the respective vlan, you can delete the subinterfaces.
For vlan tags to work on vm level, the bridge has to be vlan-aware. So in your first try it doesn't work because you strip every vlan tag.
Stuff in the second try doesn't break because you enable vlan-aware but (supposedly) because you define a pvid. If vlan 10 comes in untagged the bridge should be completely oblivious to that vlan id. At least that's how it works for me, although some documentation says something else how I learned a minute ago. :-)
So there is an error in either of your tries and only the outcome is similar.
Apart from that I can't tell you why tagging the nics worked on a single cable for you.
 
Last edited:
@ph0x thanks for the input, some comments and questions:

the tagged interfaces are completely irrelevant for VM communication. If your host doesn't need an address in the respective vlan, you can delete the subinterfaces.
I tried that initially with the same results. Based on my reading I didn't think I needed the stansas for the VLANs either, as you correctly inferred that the untagged VLAN is the only one the host is concerned with.

For vlan tags to work on vm level, the bridge has to be vlan-aware. So in your first try it doesn't work because you strip every vlan tag.
One would think, yet I wasn't aware of that option when I initially setup Proxmox and created a VM with tagged traffic on VLAN 5 and it worked up until I introduced the bond.

Stuff in the second try doesn't break because you enable vlan-aware but (supposedly) because you define a pvid. If vlan 10 comes in untagged the bridge should be completely oblivious to that vlan id. At least that's how it works for me, although some documentation says something else how I learned a minute ago. :)
So you are suggesting (using?) a stansa like this for the bridge and no stansas for the tagged VLANs?
I agree that the interface should be oblivious to the VID of the untagged VLAN, but then why would the option exist?
The below stansa with or without the bridge-vids entry only gets me access on the untagged VLAN, still no VM access to the tagged ones.

Code:
auto vmbr0
iface vmbr0 inet static
    address 192.168.10.210/24
    gateway 192.168.10.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 5,20,30,40,50
 
That's what I'd suggest, yes. If the vlans still don't work the error has to be somewhere else.
Maybe try without the last line first, thus enabling all vlan ids.

I don't know exactly what the idea behind the pvid is. Maybe only readability, or some fancy asymmetric routing, I don't know.
 
Please post the /etc/network/interfaces and a screenshot of the network section of one of the non-functioning VMs as well as the vlan config of your switch. From what I saw here it should work if everything else is configured properly.
 
Host configs are in the spoilers in the first post. VMs are all setup by GUI for DHCP. Switch config below.

@vesalius that is about the only thing I haven't tried yet, will try shortly. I've been restarting the networking service on the host and causing a reload of the virtual interface for the VMs by setting then clearing the disconnected flag on the VM's interface which has been working for the native VLAN

Code:
bifrost>show int po2 eth
Port-channel2   (Primary aggregator)

Age of the Port-channel   = 2d:03h:40m:44s
Logical slot/port   = 10/2          Number of ports = 2
HotStandBy port = null
Port state          = Port-channel Ag-Inuse
Protocol            =   LACP
Port security       = Disabled

Ports in the Port-channel:

Index   Load   Port     EC state        No of bits
------+------+------+------------------+-----------
  0     00     Gi1/0/32 Active             0
  0     00     Gi1/0/34 Active             0

Time since last port bundled:    0d:00h:28m:58s    Gi1/0/34
Time since last port Un-bundled: 0d:00h:29m:09s    Gi1/0/34
Code:
bifrost>show int po2 trunk

Port        Mode             Encapsulation  Status        Native vlan
Po2         on               802.1q         trunking      10

Port        Vlans allowed on trunk
Po2         1,5,10,20,30,40,50

Port        Vlans allowed and active in management domain
Po2         1,5,10,20,30,40,50

Port        Vlans in spanning tree forwarding state and not pruned
Po2         1,5,10,20,30,40,50
Code:
bifrost>show int po2 switchport
Name: Po2
Switchport: Enabled
Administrative Mode: trunk
Operational Mode: trunk
Administrative Trunking Encapsulation: dot1q
Operational Trunking Encapsulation: dot1q
Negotiation of Trunking: On
Access Mode VLAN: 1 (default)
Trunking Native Mode VLAN: 10 (Home)
Administrative Native VLAN tagging: enabled
Voice VLAN: none
Administrative private-vlan host-association: none
Administrative private-vlan mapping: none
Administrative private-vlan trunk native VLAN: none
Administrative private-vlan trunk Native VLAN tagging: enabled
Administrative private-vlan trunk encapsulation: dot1q
Administrative private-vlan trunk normal VLANs: none
Administrative private-vlan trunk associations: none
Administrative private-vlan trunk mappings: none
Operational private-vlan: none
Trunking VLANs Enabled: 1,5,10,20,30,40,50
Pruning VLANs Enabled: 2-1001

Protected: false
Unknown unicast blocked: disabled
Unknown multicast blocked: disabled
Appliance trust: none
 
Hm, not so bad actually. Did you try disable the pruning? Maybe that's where the hiccups come from. On a second glance, your vlans are marked as not pruned ...
Please still post a config of one of the VM's. Either the conf file or a screenshot.
 
Last edited:
I'm looking in to disabling pruning, I've had this Cisco switch for a week so still learning.

VM configs below, I'm testing with one untagged and one tagged on VLAN 5

1662664962884.png
1662665156961.png
1662665032585.png
1662664918715.png
 
Does the VM on vlan 5 get an address from dhcp?
Do you try to reach it from a computer on vlan 5 or from the proxmox host?
Did a reboot of the host solve it?
 
VTP Pruning is disabled on the switch, so pruning settings can be ignored.

Reboot did not change anything.

VM on 5 currently does not get an address and is never able to bring up its interface.
 
Okay, so that's 99.9% not a problem on the proxmox side. Unfortunately, I'm no Cisco expert either but if it works with a single nic it should also work with a bond, especially since we checked your config, which looks fine (probably even with bridge-pvid!).
 
but if it works with a single nic it should also work with a bond
My thoughts exactly, and yet it's not lol

I'm pretty confident with the switch config. My pfSense box is on a LACP trunk with the same setup except a different native VLAN and I have 2 APs on single port trunks with the same setup as pfSense and all of those are working perfectly.
 
My interfaces file looks exactly like yours, the only thing I added is bond-lacp-rate fast in the bond, but I doubt that this would break the vlans.
Are the vlans also on the trunk to pfsense in order for the VMs to get an address?
 
Are the vlans also on the trunk to pfsense in order for the VMs to get an address?
Yes, I have devices on the switch and APs on VLANs 5, 10, 20, and 50 that all work (30 and 40 are reserved for future use)

the only thing I added is bond-lacp-rate fast in the bond, but I doubt that this would break the vlans
I really hope not lol
 
I really much would like to help you further, but I don't see an error in the proxmox files, since this is exactly the way it works for me.
I still guess that this is a configuration issue.
 
Well I really appreciate the help! Next step I'm thinking is get WireShark on the port and see whats actually getting sent, but that may not be a today thing.
 
Maybe it has to do with Administrative Native VLAN tagging: enabled? I don't see an Operative Native VLAN tagging stanza, so maybe this is a problem for vmbr0.
 
Yeah that does seem odd... I'll look into it, but untagged traffic is working (without bridge-pvid 10 that is)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!