VLAN Issues

vb543

Active Member
Jun 28, 2013
2
0
41
Hello!

I'm having a hard time getting VLANs to work properly on my new Proxmox cluster. Hopefully someone has some suggestions to get this working. In short, it seems that Proxmox isn't properly passing tagged packets to my VMs.

Example VLAN 542 (10.0.42.0/24):

I have a secondary 10GB NIC that's attached to a trunk port on my switch. I have a vlan aware bridge (vmbr1) created for this NIC (enp65s0). Next, I have a Windows 10 VM that is set to use vmbr1 with a vlan tag of 542. The guest OS NIC is configured on IP 10.0.42.8.

There is an IP configured on another router of 10.0.42.1.

If I try to ping .1 from .8, I get no reply. And same goes for pinging .8 from .1. However, if I run a tcpdump on the proxmox node for enp65s0, I can see the packets come in (ARP packets at least) and but never go out.

TCPDump with ping from .1 to .8:

Bash:
root@hv2:~# tcpdump -envi enp65s0 -e '(vlan 542)'
-snip-
10:53:16.578688 18:fd:74:c1:4d:0a > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 60: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.8 tell 10.0.42.1, length 42
10:53:17.635885 18:fd:74:c1:4d:0a > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 60: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.8 tell 10.0.42.1, length 42
10:53:18.675923 18:fd:74:c1:4d:0a > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 60: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.8 tell 10.0.42.1, length 42
10:53:20.603052 18:fd:74:c1:4d:0a > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 60: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.8 tell 10.0.42.1, length 42
10:53:21.635981 18:fd:74:c1:4d:0a > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 60: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.8 tell 10.0.42.1, length 42
10:53:22.675995 18:fd:74:c1:4d:0a > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 60: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.8 tell 10.0.42.1, length 42
-snip-

Network config on node:


Bash:
root@hv2:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno3 inet manual

iface eno4 inet manual

iface eno2 inet manual

auto enp65s0
iface enp65s0 inet manual
#10G

auto vmbr0
iface vmbr0 inet static
        address 10.10.44.36/24
        gateway 10.10.44.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0

auto vmbr1
iface vmbr1 inet manual
        bridge-ports enp65s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#10GB Bridge

auto vlan88
iface vlan88 inet static
        address 10.10.88.36/24
        vlan-raw-device vmbr1
#CEPH VLAN 88

auto vlan542
iface vlan542 inet static
        address 10.0.42.9/24
        vlan-raw-device vmbr1
#Test VLAN 542

auto vlan42
iface vlan42 inet static
        address 10.10.42.242/24
        vlan-raw-device vmbr1
#Test VLAN 42

All the above was generated via the GUI. As you can see, I have a few other vlans setup that I've been using for testing. The 'Linux VLAN' with IPs on the hypervisor also do not work.... with the exception of VLAN 88 !?

That last bit is what really makes me scratch my head, vlan 88 works just fine from the router, to the node, to the guest VM (if I switch it from 542 to 88). Initially this last bit of information made me question the switch's configuration. But I'm using that switch to handle trunking to several other switches without issue, and this port is configured the same. Additionally, tcpdump shows the packets coming in on the correct vlan.

If I run the packet capture on the bridge, I can see the VM calling out with ARP requests but the router on .1 never makes it to the bridge.

Bash:
root@hv2:~# tcpdump -envi vmbr1 -e '(vlan 542)'
tcpdump: listening on vmbr1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
11:06:17.091353 5e:87:93:37:21:39 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.1 tell 10.0.42.8, length 28
11:06:18.079245 5e:87:93:37:21:39 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.1 tell 10.0.42.8, length 28
11:06:19.079239 5e:87:93:37:21:39 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.1 tell 10.0.42.8, length 28
11:06:20.090636 5e:87:93:37:21:39 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.1 tell 10.0.42.8, length 28
11:06:21.079295 5e:87:93:37:21:39 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.1 tell 10.0.42.8, length 28
11:06:22.079316 5e:87:93:37:21:39 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.1 tell 10.0.42.8, length 28
11:06:23.092117 5e:87:93:37:21:39 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 542, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.42.1 tell 10.0.42.8, length 28
^C
7 packets captured
7 packets received by filter
0 packets dropped by kernel

It's entirely possible that there's something super simple that I'm overlooking. I appreciate any suggestions in advance!

Thanks!
 
For the vlan88, vlan542 and vlan42 interfaces you are missing the vlan-id. A partial interfaces file from one of my systems

Code:
auto vmbr0
iface vmbr0 inet manual
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
#Internal VLANs Bridge

auto Management
iface Management inet static
    address 10.42.27.22/24
    gateway 10.42.27.11
    mtu 1500
    vlan-id 27
    vlan-raw-device vmbr0
#PVE Management Interface

To get untagged traffic to the VM the VLAN Tag needs to be specified on the network interface of the VM. This can be done in the GUI.

To get tagged traffic to the VM you will need to edit the VM's config file and configure the VLAN in the VM's OS.
Details at https://pve.proxmox.com/pve-docs/pve-admin-guide.html. Search for "trunks".
 
Last edited:
Thanks for the reply!
For the vlan88, vlan542 and vlan42 interfaces you are missing the vlan-id. A partial interfaces file from one of my systems
Interesting that the web-UI makes it appear as though it is properly recognizing the VLAN tag but not adding it to the config.
1679689735058.png
Regardless, I've added "vlan-id 27" to "iface vlan542 inet static" and ran 'ifreload -a' and was greeted with some errors.

Bash:
root@hv2:~# ifreload -a
error: enp65s0: failed to set vid `{127, 128, 129, --snip-- 4092, 4093, 4094}` (cmd '/sbin/bridge -force -batch - [vlan add vid 127-4094 dev enp65s0 ]' failed: returned 1 (RTNETLINK answers: No space left on device
Command failed -:1
))

I am not sure what 'device' is out of space...
Bash:
root@hv2:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   63G     0   63G   0% /dev
tmpfs                  13G  1.6M   13G   1% /run
/dev/mapper/pve-root   64G  3.1G   58G   6% /
tmpfs                  63G   60M   63G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
/dev/fuse             128M   28K  128M   1% /etc/pve
tmpfs                  63G   68K   63G   1% /var/lib/ceph/osd/ceph-2
tmpfs                  63G   68K   63G   1% /var/lib/ceph/osd/ceph-3
tmpfs                  13G     0   13G   0% /run/user/0


Some Google Fu later lead me to these forum postings:

Mellanox ConnectX-3 Issues
VLAN with tag above 126 problem
Interface vlans not created for containers and VMs after uninstalling ifupdown2

To summarize those articles, my Mellanox MCX311A-XCAT ConnectX-3 does not support more than a limited number of VLANs at a time, seems to be around 128 based on those forums posts above.

The fix was to remove the below two lines from vmbr1
Bash:
        bridge-vlan-aware yes
        bridge-vids 2-4094

My final working interface config looks like:
Bash:
root@hv2:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno3 inet manual

iface eno4 inet manual

iface eno2 inet manual

auto enp65s0
iface enp65s0 inet manual
#10G

auto vmbr0
iface vmbr0 inet static
        address 10.10.44.36/24
        gateway 10.10.44.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0

auto vmbr1
iface vmbr1 inet manual
        bridge-ports enp65s0
        bridge-stp off
        bridge-fd 0
#10GB Bridge

auto vlan88
iface vlan88 inet static
        address 10.10.88.36/24
        vlan-raw-device vmbr1
#CEPH VLAN 88

auto vlan542
iface vlan542 inet static
        address 10.0.42.9/24
        vlan-raw-device vmbr1
#Test VLAN 542

auto vlan42
iface vlan42 inet static
        address 10.10.42.242/24
        vlan-raw-device vmbr1
#Test VLAN 42

Hopefully that helps anyone that might run into this in the future. And thanks @mjtbrady for getting me down the path of manually editing the interfaces file and eventually running ifreload -a to see the actual error that didn't show up in the web-UI when making changes.
 
This thread has been a lifesaver after having similar problems with a Solarflare card after upgrading to PVE 8.

The fix was to remove the below two lines from vmbr1
Bash:
bridge-vlan-aware yes
bridge-vids 2-4094
Deleting those lines for me just made the bridge non VLAN aware, with the GUI reflects the changes as I'd expect.

If it helps anyone else, what worked for me was changing bridge vids to 2-126.
Bash:
auto vmbr2
iface vmbr2 inet static
        address 10.0.0.4/24
        bridge-ports enp5s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-126
#VLAN Trunk

That was the max amount of VLANs i could enter without having the same error as you
 
for proxmox8, you can try to add

rx-vlan-filter off in /etc/network/interfaces


Code:
auto enp5s0
iface enp5s0 ...
      ....
      rx-vlan-filter off


the reload network config (ifreload -a)

it's work with mellanox card, I'm not sure with Solarflare, it could be great to test, as I'm looking to add this option by default when vlan aware is enabled.
 
Sadly this seems to give me the orginal error again.

Just to check, I'm adding it under the Bridge config, like this?
Bash:
auto vmbr2
iface vmbr2 inet static
        address 10.0.0.4/24
        bridge-ports enp5s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        rx-vlan-filter off
#VLAN Trunk
 
should be done on physical interface and bond, not the vmbr.

Code:
auto enp5s0
iface enp5s0 inet manual
        rx-vlan-filter off

auto vmbr2
iface vmbr2 inet static
        address 10.0.0.4/24
        bridge-ports enp5s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#VLAN Trunk


then do a "ifreload -a -d" in debug, to have full logs.


#ethtool -k enp5s0 , should show "rx-vlan-filter off" is the nic driver allow to disable it
 
Got it, apologies, a little new to this side of things.

It seems that it maybe isn't allowed to, as I get an additional error this time
Bash:
Exception: cmd '/sbin/ethtool -K enp5s0 rx-vlan-filter off' failed: returned 1 (Could not change any device features
Actual changes:
rx-vlan-filter: on [requested off]
)

With the entire error:

Bash:
info: vmbr2: applying bridge port configuration: ['enp5s0']
debug: enp5s0: pre-up : running module bridgevlan
debug: enp5s0: pre-up : running module tunnel
debug: enp5s0: pre-up : running module vrf
debug: enp5s0: pre-up : running module ethtool
info: reading '/sys/class/net/enp5s0/rx-vlan-filter'
info: executing /sbin/ethtool -K enp5s0 rx-vlan-filter off
debug:   File "/usr/sbin/ifreload", line 135, in <module>
    sys.exit(main())
   File "/usr/sbin/ifreload", line 123, in main
    return stand_alone()
   File "/usr/sbin/ifreload", line 103, in stand_alone
    status = ifupdown2.main()
   File "/usr/share/ifupdown2/ifupdown/main.py", line 77, in main
    self.handlers.get(self.op)(self.args)
   File "/usr/share/ifupdown2/ifupdown/main.py", line 284, in run_reload
    ifupdown_handle.reload(['pre-up', 'up', 'post-up'],
   File "/usr/share/ifupdown2/ifupdown/ifupdownmain.py", line 2447, in reload
    self._reload_default(*args, **kargs)
   File "/usr/share/ifupdown2/ifupdown/ifupdownmain.py", line 2425, in _reload_default
    ret = self._sched_ifaces(new_filtered_ifacenames, upops,
   File "/usr/share/ifupdown2/ifupdown/ifupdownmain.py", line 1566, in _sched_ifaces
    ifaceScheduler.sched_ifaces(self, ifacenames, ops,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 595, in sched_ifaces
    cls.run_iface_list(ifupdownobj, run_queue, ops,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 325, in run_iface_list
    cls.run_iface_graph(ifupdownobj, ifacename, ops, parent,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 302, in run_iface_graph
    cls.run_iface_list(ifupdownobj, dlist, ops,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 325, in run_iface_list
    cls.run_iface_graph(ifupdownobj, ifacename, ops, parent,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 302, in run_iface_graph
    cls.run_iface_list(ifupdownobj, dlist, ops,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 325, in run_iface_list
    cls.run_iface_graph(ifupdownobj, ifacename, ops, parent,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 315, in run_iface_graph
    cls.run_iface_list_ops(ifupdownobj, ifaceobjs, ops)
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 188, in run_iface_list_ops
    cls.run_iface_op(ifupdownobj, ifaceobj, op,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 106, in run_iface_op
    m.run(ifaceobj, op,
   File "/usr/share/ifupdown2/addons/ethtool.py", line 669, in run
    op_handler(self, ifaceobj)
   File "/usr/share/ifupdown2/addons/ethtool.py", line 393, in _pre_up
    self.do_offload_settings(ifaceobj, 'rx-vlan-filter', 'rx-vlan-filter')
   File "/usr/share/ifupdown2/addons/ethtool.py", line 224, in do_offload_settings
    self.log_error('%s: %s' %(ifaceobj.name, str(e)), ifaceobj)
   File "/usr/share/ifupdown2/ifupdownaddons/modulebase.py", line 121, in log_error
    stack = traceback.format_stack()
debug: Traceback (most recent call last):
  File "/usr/share/ifupdown2/addons/ethtool.py", line 222, in do_offload_settings
    utils.exec_command(cmd)
  File "/usr/share/ifupdown2/ifupdown/utils.py", line 414, in exec_command
    return cls._execute_subprocess(shlex.split(cmd),
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/ifupdown2/ifupdown/utils.py", line 392, in _execute_subprocess
    raise Exception(cls._format_error(cmd,
Exception: cmd '/sbin/ethtool -K enp5s0 rx-vlan-filter off' failed: returned 1 (Could not change any device features
Actual changes:
rx-vlan-filter: on [requested off]
)
error: enp5s0: cmd '/sbin/ethtool -K enp5s0 rx-vlan-filter off' failed: returned 1 (Could not change any device features
Actual changes:
rx-vlan-filter: on [requested off]
)
 
mmm, maybe solarflare driver don't allow to change it.
(I'm not sure if it can be done when the interface is already up and in a bridge).

if you are able to do a reboot to be sure, it could be great.
 
I had the same issue today on my PVE 8.1.4 with Mellanox Technologies MT26448
What I did is just add whatever VLAN number I have in my network, the limit is a total of 126 VLAN.
I don't need to add and it's not working for my nic anyway.
rx-vlan-filter off

Just add under the bridge
bridge-vlan-aware yes
bridge-vids 10,20,30,40,50 <---- Keep the total number under 126
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!