[SOLVED] LACP bond issues with native + tagged VLANs

Well I've been at this all afternoon and I'm not getting anywhere.

Wireshark shows no tagged traffic coming out of the server with any config, even with the single NIC connection (seems I was wrong about it working with this Cisco switch). About the only thing I found was CDP/DTP/VTP packets being sent by the switch so I tried turning DTP off, but that did nothing as expected since I had already manually set the port to trunk and 802.1Q encapsulation.

I did learn that the reason to specify the PVID on the bridge is to prevent Spanning Tree loops.
 
Well as usual the problem was the idiot between the keyboard and chair...

Just before creating to the LACP bond I upgraded my old HPe switch to a Cisco and made a small mistake in its configuration in comparison to the old one which caused a series of misleading errors. There were never any issues with the configuration of the interfaces for Proxmox on the switch or server.

tldr The switch was accepting tagged VLAN 5 traffic from pfSense but it was sending untagged VLAN 5 traffic to pfSense. The VM I was using to test tagged traffic in Proxmox with kept working on VLAN 5 until I reset its virtual network adapter because it still had an IP address, which led me to believe the issue was with the host bridge config and not VLAN 5.

@ph0x thanks for helping me chase my own tail for a while!

The Full Story
My pfSense box was configured for tagged traffic on VLAN 5 but I setup the new switch with native VLAN 5 and tagged VLAN 5 (this is inline with how I configured my APs and how I thought I had the old switch configured). The VM that is on VLAN 5 had an open connection to pfSense at the time I created the bond and rest the host networking, pfSense re-established the session with the VM because the switch was setup to accept tagged VLAN 5 traffic from pfSense. Because the VM kept its IP address I was still able to ssh into the switch, so I didn't realize that pfSense was no longer able to receive traffic on VLAN 5 (due to the switch sending it untagged and pfSense ignoring it). Once I figured out how to trigger a reload of the virtual network adapter, the VLAN 5 VM lost it's IP address and was never able to get it back because no traffic was getting to pfSense.

All the while I have a VM on VLAN 20 running MQTT, which my openHAB instance has not been able to connect to since I setup the bond. Had I been diligent and also tested connectivity on the VM in VLAN 20 I would have found that it was connected just fine. I assumed that the continued connectivity issues on the openHAB end were because I had no tagged traffic coming out of Proxmox when in fact it was just openHAB being terrible at dealing with network changes from inside a Docker container.

So the whole time the only issue was that the switch was sending untagged VLAN 5 traffic to pfSense, which was expecting tagged traffic. The switch accepting the tagged traffic caused some weird and misleading issues that, combine with an assumption, caused me to think Proxmox was the problem.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!