I have a very peculiar issue which I never saw before, and which took me forever to even narrow down to PVE host. My server has a single interface that carries multiple VLANs. Then VMs are attached to particular VLANs. My network config (here limited to two VLANs) is quite simple:
Then in my VM config I set:
The above configuration does work. However, as soon as I remove PVE host from being present on vmbr1.40 things get "interesting":
Given the symptoms I went back and forth over the networking side - switches, routers, firewall - and found nothing. On a hunch I added address to the vmbr1.40 and it magically started working. What's 100x worse here is removing the IP again didn't cause it to break and I am pulling my hairs out. Since I'm testing this on a separate host I was able to fully reboot it and it still working without an IP. Thanks to a maintenance window over the weekend I was also able to reboot most of the switches and the main gateway - it is still working.
Anyone sees anything strange in this configuration? I really don't like configurations that "magically" start working. I also see that Proxmox creates dozens of "tap#" interfaces, suggesting that something isn't totally right with my VLANs handling on the host.
Code:
auto ensfp0
iface ensfp0 inet manual
auto vmbr1
iface vmbr1 inet manual
bridge-ports ensfp0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-500
auto vmbr1.20
iface vmbr1.20 inet static
address 10.0.20.2/24
gateway 10.0.20.1
auto vmbr1.40
iface vmbr1.40 inet static
address 10.0.40.2/24
Then in my VM config I set:
Code:
net0: virtio=12:34:56:78:9A:BC,bridge=vmbr1,tag=40
The above configuration does work. However, as soon as I remove PVE host from being present on vmbr1.40 things get "interesting":
- The VM is able to ping hosts in the same network as well as ones in other ones (including going properly to the gateway and accessing WAN)
- VM accepts inbound connections from the same network as well as other ones
- TCP handshake fails if connection comes from WAN over the gateway that DST-NATs it.... which is incredibly strange
Given the symptoms I went back and forth over the networking side - switches, routers, firewall - and found nothing. On a hunch I added address to the vmbr1.40 and it magically started working. What's 100x worse here is removing the IP again didn't cause it to break and I am pulling my hairs out. Since I'm testing this on a separate host I was able to fully reboot it and it still working without an IP. Thanks to a maintenance window over the weekend I was also able to reboot most of the switches and the main gateway - it is still working.
Anyone sees anything strange in this configuration? I really don't like configurations that "magically" start working. I also see that Proxmox creates dozens of "tap#" interfaces, suggesting that something isn't totally right with my VLANs handling on the host.