Untagged VLAN bug in 7.3.3.

Conker

New Member
May 10, 2022
6
0
1
Hi All,

Banging my head against this situation for the better part of 2 days.

I have a Dell server with 4 NICs that I've paired together with 2 bonds and 2 bridges. Diagram to illustrate because the words are escaping me.

Untitled Diagram.drawio.png

The management interface works off of the untagged VLAN1 of 10.1.8.0/24

80% of my VMs are on various VLANs, but we do have a couple VMs that we need to put on the untagged VLAN1. All of this was working great last week. All the VMs were getting the addresses they needed.

Well, I ran updates on the Proxmox and noticed afterwards that VMs destined for the Untagged VLAN were no longer able to access the network (while the VMs with Tagged networks worked fine).

My config is here:

Code:
auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto eno3
iface eno3 inet dhcp

auto eno4
iface eno4 inet dhcp

auto bond0
iface bond0 inet manual
        bond-slaves eno1 eno2
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3

auto bond1
iface bond1 inet manual
        bond-slaves eno3 eno4
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3

auto vmbr0
iface vmbr0 inet static
        address 10.1.8.200/24
        gateway 10.1.8.1
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
#Internal Network

auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

I have also spun up Proxmox (7.3.3) on another server to see if my config is broken (this one with 2 NICs) and experience the same symptoms: VLAN Tagged VMs work as expected, but Untagged VLAN VMs cannot access the network. (Management interface can access the untagged network with no issue).

I have tried the following in my troubleshot and verified the following:

  • All ports on the 24 Port (including trunk and upstream trunk [48 port Unifi]) are on the ALL profile with access to the untagged VLAN (plugged my laptop in and got network)
  • WatchGuard router is advertising and tagged properly at the interface (1 untagged, 14 tagged VLANs)
  • I have factory reset all switches between the Proxmox host and the Router
  • I have taken out the bonds and laid the bridge over a single interface with vlan-aware and without vlan-aware
  • Built an entirely new VM to see if the Windows Server VM was junked up (same symptoms)
  • Tried VMs on vmbr0 (non-vlan aware) and vmbr1 (vlan aware) to see if the bridge or bond was broken
  • I'm sure other things, but my brain is so tired that I can't think of them
The strange part is that this issue followed me to another server, so I don't think it's an issue with our Dell server. Perhaps I am missing something small but critical. I can't think of why this is not working. This is not a super critical production environment, but I would love to get this working again. My counterpart is trying to persuade me into moving over to XCP-NG and using this instance as fuel to his fire, lol. I really like Proxmox and want to stick with it and if I can get this working again, then I can tell him to kick rocks.

Any thoughts/advice/suggestions are GREATLY appreciated.

Thanks!
 
Can you post the config of a tagged VLAN VM and one of an untagged VLAN VM? (qm config <VMID>)

Additionally the network configuration from inside those VMs would be interesting! (ip a & cat /etc/network/interfaces or whatever file your OS uses)
 
Last edited:
Hi Stefan,

Sure thing! Here you go!

This is the working VM

Code:
boot: order=scsi0;net0
cores: 2
memory: 8096
meta: creation-qemu=6.2.0,ctime=1658790661
name: LGSM-Valheim
net0: virtio=C6:FF:C0:4F:21:0F,bridge=vmbr1,tag=11
numa: 0
onboot: 1
ostype: l26
scsi0: VM-Storage:vm-300-disk-0,size=60G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=fcf93599-694c-482f-ba37-b108bfbcd926
sockets: 2
vmgenid: 0e7a7734-b5f0-4a1d-95d1-ac6f387fe972

This is the non-working VM

Code:
boot: order=ide0
cores: 1
ide0: VM-Storage:vm-203-disk-0,size=60G
machine: pc-i440fx-6.1
memory: 4096
meta: creation-qemu=6.1.1,ctime=1646777006
name: Server2019-BDC-01
net0: e1000=BA:1D:3E:55:86:38,bridge=vmbr0
numa: 0
onboot: 1
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=3fe46cd1-053f-458f-ada3-f7fa54c28506
sockets: 1
vmgenid: 6f1f1637-8fa5-42d7-9aee-607fa903f8c0

And then here is the config of the working Ubuntu VM (300):

Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether c6:ff:c0:4f:21:0f brd ff:ff:ff:ff:ff:ff
    inet 172.30.10.3/24 brd 172.30.10.255 scope global dynamic ens18
       valid_lft 22113sec preferred_lft 22113sec
    inet6 fe80::c4ff:c0ff:fe4f:210f/64 scope link
       valid_lft forever preferred_lft forever

And the non working Server 2019 VM (203) (statically assigned inside the VM. 10.1.8.1 is open for pings):

1671202970371.png

I appreciate the response and if I can snag you any more info, please let me know!

Thanks!
 
What is the output of ip a on the host? Are there maybe any IP address conflicts (e.g. have you tried with a different IP)
 
Hi Stefan,

I did check for IP Conflicts in the initial troubleshooting and even tried switching the VMs over to DHCP with no success. There are no conflicts as of right now (VM is static to 10.1.8.9 which is outside of the DCHP pool and nothing else is assigned to that IP. I've also tried setting it to other IPs with no success).

Here is the output of ip a on the host. I grabbed only the Proxmox related outputs as it is quite inflated (We run Docker side-by-side of the host, which hasn't caused issue before)

Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever


2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 5a:58:2f:3f:4f:fd brd ff:ff:ff:ff:ff:ff permaddr d4:be:d9:f5:2e:a4
    altname enp1s0f0


3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 5a:58:2f:3f:4f:fd brd ff:ff:ff:ff:ff:ff permaddr d4:be:d9:f5:2e:a6
    altname enp1s0f1
    inet 10.1.8.117/24 brd 10.1.8.255 scope global dynamic noprefixroute eno2
       valid_lft 1934sec preferred_lft 1934sec


4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether d4:be:d9:f5:2e:a8 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f0
    inet 10.1.8.11/24 brd 10.1.8.255 scope global dynamic noprefixroute eno3
       valid_lft 2531sec preferred_lft 2531sec
    inet6 fe80::f370:c620:8887:ad9e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever


5: eno4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
    link/ether 6a:81:bf:89:ea:d2 brd ff:ff:ff:ff:ff:ff permaddr d4:be:d9:f5:2e:aa
    altname enp2s0f1


62: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether 5a:58:2f:3f:4f:fd brd ff:ff:ff:ff:ff:ff


63: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether 6a:81:bf:89:ea:d2 brd ff:ff:ff:ff:ff:ff


64: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5a:58:2f:3f:4f:fd brd ff:ff:ff:ff:ff:ff
    inet 10.1.8.200/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::5858:2fff:fe3f:4ffd/64 scope link
       valid_lft forever preferred_lft forever


65: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:81:bf:89:ea:d2 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::6881:bfff:fe89:ead2/64 scope link
       valid_lft forever preferred_lft forever

Just to make sure those switch ports are working on the Unifi, I did take my laptop out there and plug in without any config on my laptop's NIC and got access to the untagged native VLAN.
 
Just to make sure: Docker does mess quite a lot with iptables and networking in general which is why it's not really recommended to have it installed side-by-side with Proxmox VE. Have you tried stopping the Docker daemon and see if it works then?
 
Taking down docker didn't seem to affect anything, but I did do a docker swarm and a little more advanced stuff with it (Portainer clustering), so I am nuking one of our Proxmox nodes and am going to rebuild it without docker and I'll report back.

Thanks for your help on this!
 
It was the docker! Bah. It had been working with Docker side-by-side no issue for so long. Perhaps it was the swarming that broke it this time.

Thanks for your help on this!
 
It was the docker! Bah. It had been working with Docker side-by-side no issue for so long. Perhaps it was the swarming that broke it this time.

Thanks for your help on this!
What do you mean by "side-by-side" ?
Proxmox can't run Docker on the host or Proxmox don't like running Docker even in a VM?
I just started with Docker and if it's to hit the fan, I'll try another solution.
 
It was the docker! Bah. It had been working with Docker side-by-side no issue for so long. Perhaps it was the swarming that broke it this time.
Might very well be possible, I haven't worked with swarm a lot so sadly I cannot say much about how it would affect Proxmox. One of the main issues I know of is that Docker does a lot of stuff with iptables which can lead to problems down the road. You can run Docker in a VM / Container though, which is the preferred solutions to prevent situations like this.

Proxmox can't run Docker on the host or Proxmox don't like running Docker even in a VM?
In a VM is perfectly fine, the problems only occur when running Docker on the Proxmox host.
 
  • Like
Reactions: NinthWave

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!