[SOLVED] SDN - 802.1q QinQ VLAN DHCP Reply not arriving

AliveDevil

New Member
Jan 20, 2024
3
1
3
Hey there,

I have this strange issue with the current Proxmox 8.1 SDN-module, where DHCP replies don't arrive at the destination VM.

I assume following setup:
Code:
# /etc/pve/sdn/zones.cfg
qinq: zone9539
    bridge vmbr0
    tag 834
    ipam pve
    vlan-protocol 802.1q

# /etc/pve/sdn/vnets.cfg
vnet: vnet22A9
    zone zone9539   
    alias home-dhcp-test
    tag 90

Code:
# /etc/network/interfaces

auto vmbr0
iface vmbr0 inet manual
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    mtu 9000

# /etc/network/interfaces.d/sdn
auto ln_zone9539
iface ln_zone9539
    link-type veth
    veth-peer-name pr_zone9539

auto pr_zone9539
iface pr_zone9539
    link-type veth
    veth-peer-name ln_zone9539

auto vmbr0
iface vmbr0
    bridge-vlan-protocol 802.1q

auto vnet22A9
iface vnet22A9
    bridge_ports z_zone9539.90
    bridge_stp off
    bridge_fd 0
    alias home-dhcp-test

auto z_zone9539
iface z_zone9539
    bridge-stp off
    bridge-ports vmbr0.834 ln_zone9539
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

Nodes are connected through Emulex SFP+ NICs, over a Mikrotik CRS315 Switch running SwOS (not RouterOS).
Code:
HostA
== 10G DAC ==
CRS315
== 10G DAC ==
HostB

I assume that this setup works as expected, in that it pushes VLAN 90 onto all packages in vnet22A9, and every packet entering zone9539 receives VLAN Tag 834, so that the ethernet packet going out of eth0 is tagged as 0x8100 834 | 0x8100 90 | 0x0800.

What is not expected, is that DHCP Reply-packets aren't received across the switch.
That is:
- On HostA create a OpenWRT VM, with a NIC in vnet22A9 (192.0.2.1/24)
Setup a dhcp-server (in my case this is a dhcp relay, but that shouldn't matter)
- On HostB create a VyOS-VM (makes it easier to run renew commands), with a NIC in vnet22A9
- Configure VyOS with
Code:
  conf
  set interface ethernet eth0 address dhcp
- The OpenWRT VM sees the DHCP Discover broadcast on the vnet22A9-device
- In case of relay: Forwards the packet to relay, and receives a reply
- The OpenWRT VM creates a unicast DHCP Offer packet, and sends it out of the vnet22A9 device
- The VyOS VM never sees this DHCP Offer and continuously sends DHCP Discovers.

With that same setup, following works fine:
- Assign static ip to previously created VyOS VM
Code:
conf
del interface ethernet eth0
set interface ethernet eth0 address 192.0.2.2/24
- From VyOS
- Ping to OpenWRT works
- traceroute -I to OpenWRT works
- traceroute to OpenWRT works
- From OpenWRT
- Ping to VyOS works
- traceroute -I to VyOS works
- traceroute to VyOS works

From my perspective this doesn't make any sense:
- From VyOS via CRS315 the OpenWRT VM sees the DHCP Discover
- From OpenWRT via CRS315 the VyOS VM doesn't see the DHCP Offer
- From OpenWRT via CRS315 Ping VyOS works
- From OpenWRT via CRS315 traceroute VyOS works
- From OpenWRT via CRS315 traceroute -I VyOS works

Additional things I observed:
- If VyOS and OpenWRT are colocated the DHCP Offer is received correctly
- If I change the zone VLAN protocol to 802.1ad the DHCP Offer is received
The CRS315 doesn't "know" the 802.1ad-protocol, thus treats the packet as untagged

I did try to replicate this scenario on one node, using the following setup:
Add OpenWRT to vmbr0 directly (allowing access to all VIDs)
Keep VyOS on the vnet22A9
In OpenWRT create bridge br0 over eth0, with VLAN Filtering enabled and adding VID 834 as a port
In OpenWRt create bridge br1 over br0.834, with VLAN Filtering enabled, and adding VID 90 as a port
Assign IP and DHCP Server to br1.90
This works fine, the DHCP client VyOS receives the DHCP Offer.

I did try things with bridge-ageing 0 and bridge-learning off but that didn't make any difference.

What's the reason, that 802.1q (ethtype 8100) stacked VLANs don't work across a physical switch? Does the switch really need to understand ethtype 8100 stacking, or is this some other option I missed entirely?

FWIW: I did use OpenVSwitch before, but the dot1q-tunnel other-config:qinq-ethtype=802.1q didn't work either, so I switched everything back to Linux Bridges hoping that this would solve the issue.
 
Last edited:
Added tcpdump (tcpdump -evni ether host DHCP-Client-MAC) on both nodes:
Node of DHCP Server:
Code:
19:47:20.964884 bc:24:11:bf:10:35 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 350: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype IPv4 (0x0800), (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from bc:24:11:bf:10:35, length 300, xid 0x52001c78, Flags [none]
          Client-Ethernet-Address bc:24:11:bf:10:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message (53), length 1: Discover
            Hostname (12), length 4: "vyos"
            Parameter-Request (55), length 13: 
              Subnet-Mask (1), BR (28), Time-Zone (2), Default-Gateway (3)
              Domain-Name (15), Domain-Name-Server (6), Unknown (119), Hostname (12)
              Netbios-Name-Server (44), Netbios-Scope (47), MTU (26), Classless-Static-Route (121)
              NTP (42)
19:47:20.966956 bc:24:11:ef:25:c8 > bc:24:11:bf:10:35, ethertype 802.1Q (0x8100), length 355: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype IPv4 (0x0800), (tos 0xc0, ttl 64, id 28268, offset 0, flags [none], proto UDP (17), length 333)
    192.0.2.1.67 > 192.0.2.118.68: BOOTP/DHCP, Reply, length 305, hops 1, xid 0x52001c78, Flags [none]
          Your-IP 192.0.2.118
          Gateway-IP 192.0.2.1
          Client-Ethernet-Address bc:24:11:bf:10:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message (53), length 1: Offer
            Subnet-Mask (1), length 4: 255.255.255.0
            Default-Gateway (3), length 4: 192.0.2.1
            Domain-Name-Server (6), length 4: 192.0.2.1
            Hostname (12), length 4: "vyos"
            Domain-Name (15), length 11: "kea.home.xc"
            Lease-Time (51), length 4: 3600
            Server-ID (54), length 4: 172.19.216.11
            RN (58), length 4: 900
            RB (59), length 4: 1800
19:47:25.661915 bc:24:11:bf:10:35 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 350: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype IPv4 (0x0800), (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from bc:24:11:bf:10:35, length 300, xid 0x52001c78, secs 5, Flags [none]
          Client-Ethernet-Address bc:24:11:bf:10:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message (53), length 1: Discover
            Hostname (12), length 4: "vyos"
            Parameter-Request (55), length 13: 
              Subnet-Mask (1), BR (28), Time-Zone (2), Default-Gateway (3)
              Domain-Name (15), Domain-Name-Server (6), Unknown (119), Hostname (12)
              Netbios-Name-Server (44), Netbios-Scope (47), MTU (26), Classless-Static-Route (121)
              NTP (42)
19:47:25.664420 bc:24:11:ef:25:c8 > bc:24:11:bf:10:35, ethertype 802.1Q (0x8100), length 355: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype IPv4 (0x0800), (tos 0xc0, ttl 64, id 63512, offset 0, flags [none], proto UDP (17), length 333)
    192.0.2.1.67 > 192.0.2.119.68: BOOTP/DHCP, Reply, length 305, hops 1, xid 0x52001c78, Flags [none]
          Your-IP 192.0.2.119
          Gateway-IP 192.0.2.1
          Client-Ethernet-Address bc:24:11:bf:10:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message (53), length 1: Offer
            Subnet-Mask (1), length 4: 255.255.255.0
            Default-Gateway (3), length 4: 192.0.2.1
            Domain-Name-Server (6), length 4: 192.0.2.1
            Hostname (12), length 4: "vyos"
            Domain-Name (15), length 11: "kea.home.xc"
            Lease-Time (51), length 4: 3600
            Server-ID (54), length 4: 172.19.216.11
            RN (58), length 4: 900
            RB (59), length 4: 1800
19:47:26.010592 bc:24:11:ef:25:c8 > bc:24:11:bf:10:35, ethertype 802.1Q (0x8100), length 50: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 192.0.2.118 tell 192.0.2.1, length 28
19:47:27.050705 bc:24:11:ef:25:c8 > bc:24:11:bf:10:35, ethertype 802.1Q (0x8100), length 50: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 192.0.2.118 tell 192.0.2.1, length 28
19:47:28.090651 bc:24:11:ef:25:c8 > bc:24:11:bf:10:35, ethertype 802.1Q (0x8100), length 50: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 192.0.2.118 tell 192.0.2.1, length 28

Node of DHCP Client:
Code:
19:47:20.964693 bc:24:11:bf:10:35 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 350: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype IPv4 (0x0800), (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from bc:24:11:bf:10:35, length 300, xid 0x52001c78, Flags [none]
          Client-Ethernet-Address bc:24:11:bf:10:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message (53), length 1: Discover
            Hostname (12), length 4: "vyos"
            Parameter-Request (55), length 13: 
              Subnet-Mask (1), BR (28), Time-Zone (2), Default-Gateway (3)
              Domain-Name (15), Domain-Name-Server (6), Unknown (119), Hostname (12)
              Netbios-Name-Server (44), Netbios-Scope (47), MTU (26), Classless-Static-Route (121)
              NTP (42)
19:47:25.661763 bc:24:11:bf:10:35 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 350: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype IPv4 (0x0800), (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from bc:24:11:bf:10:35, length 300, xid 0x52001c78, secs 5, Flags [none]
          Client-Ethernet-Address bc:24:11:bf:10:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message (53), length 1: Discover
            Hostname (12), length 4: "vyos"
            Parameter-Request (55), length 13: 
              Subnet-Mask (1), BR (28), Time-Zone (2), Default-Gateway (3)
              Domain-Name (15), Domain-Name-Server (6), Unknown (119), Hostname (12)
              Netbios-Name-Server (44), Netbios-Scope (47), MTU (26), Classless-Static-Route (121)
              NTP (42)
19:47:26.010606 bc:24:11:ef:25:c8 > bc:24:11:bf:10:35, ethertype 802.1Q (0x8100), length 60: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 192.0.2.118 tell 192.0.2.1, length 38
19:47:27.050763 bc:24:11:ef:25:c8 > bc:24:11:bf:10:35, ethertype 802.1Q (0x8100), length 60: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 192.0.2.118 tell 192.0.2.1, length 38
19:47:28.090706 bc:24:11:ef:25:c8 > bc:24:11:bf:10:35, ethertype 802.1Q (0x8100), length 60: vlan 2, p 0, ethertype 802.1Q (0x8100), vlan 12, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 192.0.2.118 tell 192.0.2.1, length 38

As far as I can tell now: ARP requests are received by the DHCP client node, which were sent from the DHCP server.
Packets going out from the DHCP server VM are correctly tagged as 2.12 (for testing) with 8100.
 
What's the reason, that 802.1q (ethtype 8100) stacked VLANs don't work across a physical switch? Does the switch really need to understand ethtype 8100 stacking, or is this some other option I missed entirely?

do you have tried with a direct cable between both nodes ? it could be usefull to see if it's because of the physical switch or not.
 
By coincidence I just posted the link to this thread online, and someone replied with a hint I didn't check before - or realized could be a culprit.

It is caused by the physical switch using some kind of DHCP snooping, which I thought wouldn't be an issue, as all ports are trusted in the management interface (source Mikrotik Help).

I did a quick check disabling "Add Information Option":
Enables or disables DHCP Option-82 information. When enabled, the Option-82 information (Agent Remote ID and Circuit ID) is added for DHCP packets received from untrusted ports. Can be used together with Option-82 capable DHCP server to assign IP addresses and implement policies. The setting does not apply to DHCPv6 packets.
Disabling this option caused the 0x8100|0x8100|0x0800-tagged DHCP Offer to finally be delivered to the DHCP client.

So, for now: Using Mikrotiks SwOS with stacked VLAN (802.1q-outer) has issues with DHCP, when adding Option-82.
802.1ad doesn't have this issue, as the switch software won't inspect these.
 
  • Like
Reactions: spirit

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!