I'm not sure what to make of it. I could only create a virtual testing scenario (obviously then lacking external physical switches and routers). And in each case communication succeeded perfectly fine regardelss of the firewall settings.
Seems weird that all traffic but TCP SYN/ACK would be suddenly blocked. OTOH It seems weird that anything gets through at all, as from the default firewall rules the packets should end up hitting PVEFW-HOTS-IN's final `-j DROP`.
OTOH vlans act a little different there. There are a few sysctls to take into account (net.bridge.bridge-nf-call-iptables, net.bridge.bridge-nf-filter-vlan-tagged, ...). And that matching on vlans directly is ebtables' task, not iptables'.
Note however that PVEFW-Drop contains a rule: `-m tcp ! --tcp-flags FIN,SYN,RST,ACK SYN -j DROP`, which basically means DROP any packet that isn't a SYN packet. HOWEVER, before reaching this chain, the previous chain (PVEFW-HOST-IN has a rule `-m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT` which SHOULD match the SYN/ACK packet...
That is, provided the initial SYN packet gets through in the first place. I don't immediately see how that would happen though. (or the ICMP or UDP traffic...)
The only thing I DID encounter here was that apparently VLAN support in general seems buggy in the old 2.6.32 kernel series, and it seems to even depend on the network card used. IE If for my VMHoster (described below) I used virtio for the network cards, VLAN traffic was generally routed to the wrong interfaces. (went from the physical one to the bridge it was part of, instead of the tagged vlan interfaces.) Switching to E1000 made it work.
Anyway, here's the setup I tried: (Abbreviating the start of IP addresses 192.168 with "..")
Code:
Main machine
| +-----------------------------------------+
+==============>| Router | Routing table:
| ________ | ___________ | ..99.0/24 => eth0
| /vmbr0 \ | / eth0 \ | ..254.0/24 => eth1
+->|..99.10 <---+--> ..99.225 | | ..253.0/24 => eth2.10
| \--------/ | \-----------/ | ..252.0/24 => eth2.11
| | X |
| | ____X_____ ________ ________ |
| | / eth1 \ |eth2.10 | |eth2.11 | |
| | | ..254.1 | |..253.1 | |..252.1 | |
| | | | +----A---+ +---A----+ |
| | \----A-----/ | | |
| | | | (tagging) | |
| | | | | |
| | | +---v-----------v---+ |
| | | | eth2 (no ip) | |
| | | \---------A---------/ |
| | | | |
| +-------|--------------------|------------+
| | |
| +----v---+ |
+----------------->|vmbr0v5 | |
| +----A---+ |
| | |
| | +----v----+
+------------------ - - | - - --------->|vmbr0v6 |
| | +----A----+
| | |
| +-----------------|--------------------|---------------------+
+====>| VM Hoster | | | Default routing table
| (nested) +------v------+ +---------v------------+ | Default firewall setup
| |eth0 | |eth1 (no ip) | |
| |..254.100 | +--A----------------A--+ |
| +-------------+ | | |
| | (tagging) | |
| | | |
| +--v-----+ +-----v--+ |
| |eth1.10 | |eth1.11 | |
| +--X-----+ +-----X--+ |
| | | |
| +---X-----+ +-----X---+ |
| |vmbr1v10 | |vmbr1v11 | |
| +---X-----+ +-----X---+ |
| | | |
| +------X-+ +-X--- --+ |
| |tap100i0| |tap101i0| |
| +--A-----+ +-----A--+ |
| | | |
| | | |
| +-------+----+ +-------+----+ |
| | vm100 | | | vm101 | | |
| | +-----v---+| | +-----v---+| |
| | | eth0 || | | eth0 || |
| | |..253.10 || | |..252.10 || |
| | +---------+| | +---------+| |
| +------------+ +------------+ |
| |
+------------------------------------------------------------+
Help me correct the scenario if it doesn't reflect the situation enough.
This is the route a packet takes from VM Hoster's vm100 to vm101:
vm100 writes to eth0, eth0 sends it off to tap100i0, bridged over vmbr1v10 to eth1.10 where it is tagged as vlan id 10. Up to this point `tcpdump -XX` shows no tag in the ethernet frame.
The tagging happens now right before eth1.10 hands the packet over to eth1, where `tcpdump -XX` shows `<dstmac> <srcmac> 8100 000a` in the ethernet frame, the 802.1q vlan tag 10.
Then eth1 sends it over the emulated physical device (the *real* host's vmbr0v6) to the router-VM. vmbr0v6 also successfully shows a tagged packet, as does tcpdump on the router on eth2.
In order for the router to care about linking VLANs together I there have to make it take off the vlan tag (otherwise no forwarding whatsoever happens.)
So the router is on the same network with the same tag via eth2.10 as 192.168.253.1.
So the packet goes from the router's eth2 to eth2.10 and in the process loses the VLAN tag. tcpdump on eth2.10 shows the regular untagged packet asking to be sent from 192.168.
253.10 to 192.168.
252.10.
The router now sees that ..252.0/24 is to be routed over eth2.11 and writes the packet to that interface. There tcpdump also shows the untagged packet.
Now the packet is tagged as belonging to vlan 11 and moves over to eth2 where it originally came from, this time with the new vlan=11 tag. (Visible via `tcpdump -XX` as `8100 000b`).
Now back over vmbr0v6 (which correctly shows the vlan=11 tagged packet) it reaches the VMHost's eth1, which forwards it to eth1.11, dropping the tag in the process.
In eth1.11 now tcpdump shows the packet without vlan tag. This interface is bridged over vmbr1v11 to tap101i0 to vm101's eth0 which now happily receives a packet originating from 192.168.25
3.10 to be delivered to vm101 on 192.168.25
2.10.
What a joyride...
Now then. If I activate the VMHost's firewall I can still ping and netcat/tcp between vm100 and vm101. Which is a BIT puzzling though...
Code:
net.bridge.bridge-nf-filter-vlan-tagged=1 (0 or 1 made no difference)
net.bridge.bridge-nf-call-iptables=1 (was 1 by default)