Hi there,
I just started playing around with Proxmox 3.2 and observed a rather odd behavior regarding vlans.
My setup is pretty simple:
eth0: external interface to network switch
vmbr0: bridges eth0 (created from Proxmox GUI)
On the switch port that eth0 is connected to I have several tagged vlans (assume vlan ids 2,3,4) besides a native vlan (let's assume 1).
I then create a VM id 100 with one non-tagged network interface.
brctl show
bridge name bridge id STP enabled interfaces
vmbr0 8000.0022195d7538 no eth0
tap100i0
Thereafter I start the VM and want it to boot from PXE. The switch has a DHCP helper set that redirects any requests to a defined IP.
The VM does receive a DHCP lease for IP 10.10.11.100/24 with gateway 10.10.11.1 together with TFTP information and starts trying to access the TFTP Server at given IP 10.10.10.10 but does only receive a timeout.
I investigated further and for that purpose I set up tcpdump listening on interfaces eth0 and tap100i0.
Outcome: After the DHCP ack the VM sends an ARP-who-has request for it's gateway 10.10.11.1 in order to get access to TFTP IP 10.10.10.10 which is outside of own subnet. The requests are observed on both interfaces.
BUT: the ARP replies are only seen on interface eth0 and never make it to the VM interface tap100i0.
I tried anything imaginable to find the cause. Turns out once I create a second VM 101 with an tagged interface for each vlan configured on the switch port connecting to eth0 AND after I had stopped and started VM 100 again the ARP replies are being sent to tap100i0 interface.
Out of curiosity I tried to replicate this behavior and removed the VM 101. I confirmed that Proxmox deleted all the associated bridges and only vmbr0 was left. I stopped and started VM 100 once again but ARP is still being received.
So I start all over again with a fresh Proxmox installation with the exact same initial config (one VM 100 with just one untagged interface) and ARP is not received on the VM anew. Once I add tagged interfaces for any tagged vlans it's working. Once I add another tagged vlan on the switchport it stops functioning. Once I remove tagged vlans on the switchport it is working.
In a typical production environment where each VM has just a unique tagged vlan for separation and where those VMs are spread over multiple Proxmox servers that have tagged all vlans on eth0 to allow network VM being functional after migration this could cause a lot of trouble.
Unfortunately I was not able yet to find out what exactly changes after the dummy interfaces are created. Of course there is a new bridge being added for every tagged vlan but after removal of the interfaces the bridges are gone as well and brctl looks the same as before. But still there must be some difference / state change left.
Maybe dietmar can shed some light on this?
What else does differentiate the config after the tagged bridges have been created once?
Does Proxmox 3.2 introduce any changes to the bridging components or virtual network adapter firmwares? For the above tests I utilized virtio network driver.
I just started playing around with Proxmox 3.2 and observed a rather odd behavior regarding vlans.
My setup is pretty simple:
eth0: external interface to network switch
vmbr0: bridges eth0 (created from Proxmox GUI)
On the switch port that eth0 is connected to I have several tagged vlans (assume vlan ids 2,3,4) besides a native vlan (let's assume 1).
I then create a VM id 100 with one non-tagged network interface.
brctl show
bridge name bridge id STP enabled interfaces
vmbr0 8000.0022195d7538 no eth0
tap100i0
Thereafter I start the VM and want it to boot from PXE. The switch has a DHCP helper set that redirects any requests to a defined IP.
The VM does receive a DHCP lease for IP 10.10.11.100/24 with gateway 10.10.11.1 together with TFTP information and starts trying to access the TFTP Server at given IP 10.10.10.10 but does only receive a timeout.
I investigated further and for that purpose I set up tcpdump listening on interfaces eth0 and tap100i0.
Outcome: After the DHCP ack the VM sends an ARP-who-has request for it's gateway 10.10.11.1 in order to get access to TFTP IP 10.10.10.10 which is outside of own subnet. The requests are observed on both interfaces.
BUT: the ARP replies are only seen on interface eth0 and never make it to the VM interface tap100i0.
I tried anything imaginable to find the cause. Turns out once I create a second VM 101 with an tagged interface for each vlan configured on the switch port connecting to eth0 AND after I had stopped and started VM 100 again the ARP replies are being sent to tap100i0 interface.
Out of curiosity I tried to replicate this behavior and removed the VM 101. I confirmed that Proxmox deleted all the associated bridges and only vmbr0 was left. I stopped and started VM 100 once again but ARP is still being received.
So I start all over again with a fresh Proxmox installation with the exact same initial config (one VM 100 with just one untagged interface) and ARP is not received on the VM anew. Once I add tagged interfaces for any tagged vlans it's working. Once I add another tagged vlan on the switchport it stops functioning. Once I remove tagged vlans on the switchport it is working.
In a typical production environment where each VM has just a unique tagged vlan for separation and where those VMs are spread over multiple Proxmox servers that have tagged all vlans on eth0 to allow network VM being functional after migration this could cause a lot of trouble.
Unfortunately I was not able yet to find out what exactly changes after the dummy interfaces are created. Of course there is a new bridge being added for every tagged vlan but after removal of the interfaces the bridges are gone as well and brctl looks the same as before. But still there must be some difference / state change left.
Maybe dietmar can shed some light on this?
What else does differentiate the config after the tagged bridges have been created once?
Does Proxmox 3.2 introduce any changes to the bridging components or virtual network adapter firmwares? For the above tests I utilized virtio network driver.