[SOLVED] VLANs on Bond

toxic

Active Member
Aug 1, 2020
57
6
28
37
Hello, I face a strange issue on my bond where it works "almost" but depends on the order in which I plug in the cables... I mean, same cable at the same place each time, but first patch cable A then patch cable B will not work, but if I plug the patch cable B first and then the patch cable A it works... Going crazy here...

  1. My (windows) PC is plugged into the witch on a port belonging to VLAN 30 (untagged) and with PVID 30, it gets IP 10.0.30.100 (via DHCP when it's working)
  2. On my PVE host, if I unplug all members of the bond, and then re-plug only enp1s0 ==> pve can ping 10.0.30.100
  3. on pve host, I plug in enp2s0 also ==> pve is now unable to ping 10.0.30.100...
  4. I unplug enp1s0 ==> pve can ping 10.0.30.100 again !
  5. I re-plug enp1s0 again ==> pve can still ping 10.0.30.100 !!!
I just don't understand how this happens... In fact when the ping fails, noone (pve, windows10, opnSenseVM) can talk to noone on any VLAN...

My setup is :
Core Switch TP-Link TL-SG1024DE has no support for 802.ad but static LAGG is ok, LAGG group defined on ports 5,6,7,8, VLAN 30 (and many others) is tagged onto all these 4 ports
My PVE host has 6 interfaces, 4 of them (enp1s0,enp2s0,enp3s0,enp4s0) bonded into bond0 in mode balance-alb
On top of this bond0, I created several VLANs using the bond0.10 naming for example for the VLAN 30.
Then, for each VLAN I created a Linux bridge vmbr30 for instance to which I assign the bridge port bond0.30.
I then start a VM running opnSense and pass it a virtual NIC that I put on vmbr30, another on vmbr10, ...
Finally I plug my windows 10 machine in the TP-Link switch onto a port that has is member of no VLAN except VLAN 30 that is untagged, and PVID of this port is also 30.

Note: all my vmbrXX have a bridge port bond0.xx, but on some of them PVE has an IP (10.0.XX.9) and on some vmbrYY it sometime has none (that's my way to insure that these VLANs must go through my opnSense if they want to talk to pve ;) )

Just to be sure my opnSense FW is not going in the way, I've added a rule in opnSense so that any is allowed to go to any, that's just for testing... And it doesn't help...

Here is my /etc/network/interfaces :
Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enx00e04c534458 inet manual
        mtu 9000

auto enp1s0
iface enp1s0 inet manual
        mtu 9000

auto enp2s0
iface enp2s0 inet manual
        mtu 9000

auto enp3s0
iface enp3s0 inet manual
        mtu 9000

auto enp4s0
iface enp4s0 inet manual
        mtu 9000

auto enp5s0
iface enp5s0 inet manual
        mtu 9000

auto enp6s0
iface enp6s0 inet manual
        mtu 9000

iface wlx002243734416 inet manual

iface enx00e04c680227 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves enp1s0 enp2s0 enp3s0 enp4s0
        bond-miimon 100
        bond-mode balance-alb
        mtu 9000

auto bond0.10
iface bond0.10 inet manual
        mtu 9000
#Servers

auto bond3
iface bond3 inet manual
        bond-slaves enp5s0 enp6s0
        bond-miimon 100
        bond-mode balance-alb
        mtu 9000

auto bond0.9
iface bond0.9 inet manual
        mtu 9000
#Network

auto bond0.11
iface bond0.11 inet manual
        mtu 9000
#NAS

auto bond0.22
iface bond0.22 inet manual
        mtu 9000
#CCTV

auto bond0.1
iface bond0.1 inet manual
        mtu 9000
#Default

auto bond0.8
iface bond0.8 inet manual
        mtu 9000
#DNS

auto bond0.30
iface bond0.30 inet manual
        mtu 9000
#Main

auto bond0.40
iface bond0.40 inet manual
        mtu 9000
#MainWifi

auto bond0.50
iface bond0.50 inet manual
        mtu 9000
#Media

auto bond0.60
iface bond0.60 inet manual
        mtu 9000
#IOT

auto bond0.99
iface bond0.99 inet manual
        mtu 9000
#Test

auto vmbr0
iface vmbr0 inet manual
        bridge-ports none
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        mtu 9000

auto vmbr10
iface vmbr10 inet static
        address 10.0.10.9/24
        bridge-ports bond0.10
        bridge-stp off
        bridge-fd 0
        mtu 9000
#Servers

auto vmbr3
iface vmbr3 inet static
        address 192.168.1.9/24
        bridge-ports bond3
        bridge-stp off
        bridge-fd 0
        mtu 9000
#ISP

auto vmbr9
iface vmbr9 inet static
        address 10.0.9.9/24
        bridge-ports bond0.9
        bridge-stp off
        bridge-fd 0
        mtu 9000
#Network

auto vmbr30
iface vmbr30 inet static
        address 10.0.30.9/24
        bridge-ports bond0.30
        bridge-stp off
        bridge-fd 0
        mtu 9000
#Main

auto vmbr40
iface vmbr40 inet manual
        bridge-ports bond0.40
        bridge-stp off
        bridge-fd 0
        mtu 9000
#MainWifi

auto vmbr50
iface vmbr50 inet manual
        bridge-ports bond0.50
        bridge-stp off
        bridge-fd 0
        mtu 9000
#Media

auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond0.1
        bridge-stp off
        bridge-fd 0
        mtu 9000
#Default

auto vmbr8
iface vmbr8 inet manual
        bridge-ports bond0.8
        bridge-stp off
        bridge-fd 0
        mtu 9000
#DNS

auto vmbr22
iface vmbr22 inet manual
        bridge-ports bond0.22
        bridge-stp off
        bridge-fd 0
        mtu 9000
#CCTV

auto vmbr11
iface vmbr11 inet manual
        bridge-ports bond0.11
        bridge-stp off
        bridge-fd 0
        mtu 9000
#NAS

auto vmbr60
iface vmbr60 inet manual
        bridge-ports bond0.60
        bridge-stp off
        bridge-fd 0

auto vmbr99
iface vmbr99 inet static
        address 10.0.0.9/24
        bridge-ports none
        bridge-stp off
        bridge-fd 0
        mtu 9000
#Test

And my VM config :
Code:
bootdisk: virtio0
cores: 6
cpu: kvm64,flags=+aes
ide2: local:iso/OPNsense-21.1-OpenSSL-dvd-amd64.iso,media=cdrom,size=1637504K
memory: 8192
name: universe
net0: virtio=10:10:10:10:10:03,bridge=vmbr3
net1: virtio=10:10:10:10:10:30,bridge=vmbr30
net10: virtio=10:10:10:10:10:60,bridge=vmbr60
net11: virtio=10:10:10:10:10:99,bridge=vmbr99
net2: virtio=10:10:10:10:10:01,bridge=vmbr1
net3: virtio=10:10:10:10:10:08,bridge=vmbr8
net4: virtio=10:10:10:10:10:09,bridge=vmbr9
net5: virtio=10:10:10:10:10:10,bridge=vmbr10
net6: virtio=10:10:10:10:10:11,bridge=vmbr11
net7: virtio=10:10:10:10:10:22,bridge=vmbr22
net8: virtio=10:10:10:10:10:40,bridge=vmbr40
net9: virtio=10:10:10:10:10:50,bridge=vmbr50
numa: 0
onboot: 1
ostype: other
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=a1bcf77c-d63d-4a23-b6ee-00b7bf6a158f
sockets: 1
startup: order=1
virtio0: local-lvm:vm-1000-disk-0,size=60G
vmgenid: 5748de30-da8f-496a-9c9b-ca35adaef423
 
Ok, it seems static LAG on my switch is actually bond type balance-xor.
Tried it and for now it seems to work even with all interfaces plugges in which was not the case earlier...
Will monitor and report...
 
Hmm... This setup seems kinda dumb...
In fact, I do have connectivity when using bond-mode balance-xor but regardless of the bond-xmit-hash-policy, It seems I cannot get over 1GB/s of bandwidth...
I have tried layer2+3, even layer3+4, adn also layer2.

I do the tests using iperf on my Synology NAS who is also using 802.3ad balance XOR on 2 ports in my switch that are in a LAGG group ans on VLAN11.
As you can see above my pve host has a big bond of 4 interfaces that also lie in this VLAN 11.
I generate traffic from 2 different MAC&IP adresses : 2 VMs on vmbr10 with distinct IP&MAC. But when I start iperf on the second VM, the bandwith on the first drops, and together they never exceed 1GB/s.

My understanding is that it is because from vmbr10 where these 2 VM are to vmbr11 where my NAS is, I only have 1 interface on my virtual opnSense, so even if there is no NAT, the virtual router puts it's single MAC address in place, and therefore my LAGG is useless...

In fact, if I put the 2 VMs on vmbr11 directly, I do see 2GB/s of trafic coming to my NAS !

Now that I have seen this, I wonder : if I add more virtualNIC on my virtualOpnSense that all lie on the same vmbr11, and then LAGG them inside opnSense, would that do the trick ?

Thanks in advance for your feedback.
 
It seems I cannot get over 1GB/s of bandwidth...
with lacp, you can't use more than 1 link for 1tcp/udp connection.


the hash policy to is balance theses differents connections between links, for traffic going out your host.
(hash policy on the switch, is for traffic going out the switch, the incoming to your hosts)
 
Thanks spirit, that was indeed my understanding ;)
I'm closing this thread since I spent hours trying out various setups, and I just fail to see why my opnsense virtualRouter is now able to send and recieve at 2GB/s to my NAS but not so when the traffic is actually beeing generated by 2 clients that use opnSense as gateway...
I'me creating a dedicated thread to explain what I wand and gather ideas...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!