firewall enabled in datacenter prevent bridge VM comunication on the same node

floryn

New Member
Apr 23, 2013
9
0
1
While testing new firewall in proxmox 3.4 I encountered the following problem: when I enable firewall (in datacenter) two kvm VM that are on the same proxmox node but eth0 on each is connected to vmbr16 respectively vmbr172 can no longer communicate. vmbr16 is a bridge to eth1 (vlan 16) and vmbr172 bridge to eth1 (vlan 172).
The firewall is enabled only in Datacenter -> Firewall -> Options. The two VM can't comunicate even if I disabled firewall for them in VM -> Hardware -> Network Device (eth0) or even if I disable firewall for that proxmox node! The strange thing is that being in two node cluster if I migrate one VM on the second node the comunication between this two VM is working again (with firewall on in Datacenter) but as soon I move this two VM on the same proxmox node the comunication between two VM stop (with firewall on in Datacenter only).
So I am forced to turn off firewall because I can't enable the comunication between the two VM if they are on the same node (and eth0 for each VM is on different VLAN (vmbr)).

Is this a bug or how can I resolve this problem?
 
Please can you post your network and VM config (in order to reproduce the behavior).

Hi Dietmar,

Sure, here are network configuration:
Code:
# network interface settings
auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual

auto eth2
iface eth2 inet static
    address  172.19.0.52
    netmask  255.255.255.0

auto eth3
iface eth3 inet static
    address  192.168.244.20
    netmask  255.255.255.0

auto vmbr0
iface vmbr0 inet static
    address  192.168.216.46
    netmask  255.255.255.0
    gateway  192.168.216.1
    bridge_ports eth0
    bridge_stp off
    bridge_fd 0

auto vmbr1
iface vmbr1 inet manual
    bridge_ports eth1
    bridge_stp off
    bridge_fd 0

auto vmbr172
iface vmbr172 inet manual
    bridge_ports eth1.172
    bridge_stp off
    bridge_fd 0

auto vmbr16
iface vmbr16 inet manual
    bridge_ports eth1.16
    bridge_stp off
    bridge_fd 0

auto vmbr15
iface vmbr15 inet manual
    bridge_ports eth1.15
    bridge_stp off
    bridge_fd 0

and VM config for first VM 125.conf:
Code:
#debian 6 
#IP%3A 192.168.16.5
#
#pmv2 - drbd2
bootdisk: virtio0
cores: 2
ide2: none,media=cdrom
memory: 512
name: web-lyra
net0: virtio=8A:00:62:07:3A:82,bridge=vmbr16
onboot: 1
ostype: l26
sockets: 1
startup: order=8
virtio0: drbd2:vm-125-disk-1,cache=writethrough,size=50G

and seconf VM 600.conf
Code:
#windows 7 32bit
#IP%3A 172.16.0.55
balloon: 512
boot: cdn
bootdisk: virtio0
cores: 1
ide2: none,media=cdrom
memory: 1024
name: win7
net0: virtio=22:9B:EE:95:BD:F8,bridge=vmbr172
onboot: 1
ostype: win7
sockets: 1
startup: order=16
tablet: 0
virtio0: drbd2:vm-600-disk-1,cache=writethrough,size=35G

[PENDING]
#windows 7 32bit
#IP%3A 172.16.0.55
#
#pmv2 - drbd2

I reproduced this problem on other VM as well don't matter OS on VM. The conditions seems to be: VM on the same node, firewall enaled on Datacenter -> Firewall -> Options and VMs with eth0 on two separate VLAN's.

Thanks for quick reply,
Florin
 
Last edited:
Why don?t you configure the vlan on the network interfaces (the standard way)? Please can you test if that works?
 
Why don?t you configure the vlan on the network interfaces (the standard way)? Please can you test if that works?

I changed now the vlan on network interfaces (the standard way) but the problem still remain.
Now I have on network:
Code:
# network interface settings
auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual

auto eth2
iface eth2 inet static
    address  172.19.0.52
    netmask  255.255.255.0

auto eth3
iface eth3 inet static
    address  192.168.244.20
    netmask  255.255.255.0

auto vmbr0
iface vmbr0 inet static
    address  192.168.216.46
    netmask  255.255.255.0
    gateway  192.168.216.1
    bridge_ports eth0
    bridge_stp off
    bridge_fd 0

auto vmbr1
iface vmbr1 inet manual
    bridge_ports eth1
    bridge_stp off
    bridge_fd 0

And on VM 125.conf
Code:
net0: virtio=8A:00:62:07:3A:82,bridge=vmbr1,tag=16

and 600.conf
Code:
net0: virtio=22:9B:EE:95:BD:F8,bridge=vmbr1,tag=172

Unfortunately this changes don't solve the problem.
 
And forgot to mention that I can't enable communication between the two VM even if I configure 'Input Policy' and 'Output Policy' to Enable in Datacenter (if firewall is on in Datacenter, and VMs firewall and node firewall is off).

Regards,
Florin
 
Sorry, but VM 125 and VM 600 use different VLAN, so they cannot communicate unless you have some routing between. What do I miss?
 
Of course I have a router in my network. The problem is that the two VM can comunicate when the firewall is OFF in Datacenter -> Firewall -> Options, but when I enable the firewall the communication stop and I'm unable to set some firewall rules to reenable this commnuication (as i told you before even if the node and VM firewall is disabled).

So, now the solution is to completly disable firewall in cluster.fw
[OPTIONS]
# enable firewall (cluster wide setting, default is disabled)
enable: 0

After this the two VM can comunnicate fine.

Thank you,
Florin
 
In my network I have a physical router, separated from the proxmox cluster (and also a physical switch with vlan support). So the route would be:

proxmox node1: VM1 eth0 'vmbr1 tag 16' (192.168.16.10 in vlan 16) -> physical switch port (with vlan support) -> physical router interfaces (192.168.16.1 and 172.16.0.1) -> physical switch port (with vlan support) -> proxmox node1: VM2 eth0 'vmbr1 tag 172' (172.16.0.10 in vlan 172)

Thank you,
Florin
 
proxmox node1: VM1 eth0 'vmbr1 tag 16' (192.168.16.10 in vlan 16) -> physical switch port (with vlan support) -> physical router interfaces (192.168.16.1 and 172.16.0.1) -> physical switch port (with vlan support) -> proxmox node1: VM2 eth0 'vmbr1 tag 172' (172.16.0.10 in vlan 172)

Please try to use tcpdump to see where packets gets lost.
 
After analysing traffic between the two VM with firewall in datacenter ON ( 'enable: 1' in cluster.fw) I can tell the followings:
- icmp trafic is OK, each VM respond to ping from the other VM
- UDP trafic is OK (tested with netcat from both VM)
- the problem appear with TCP traffic that is blocked.

Analysing further TCP traffic with tcpdump I can tell that:
VM1 send SYN request to VM2 - this step is OK
VM2 generate SYN ACK response but this packet never arrive at VM1 so this seems to be the factor that break TCP handshake.

I think the problem is with proxmox firewall because when I switch 'enable: 0' in cluster.fw the TCP connections ar OK again.

Thank you,
Florin
 
I have the same problem. I also noticed that problem doesn't appear if networks belongs to the same /8 network. For example, in my case 2 VMs belonging to networks 10.1.0.0/24 and 10.55.1.0/24 (different vmbrs) communicate with each other perfectly fine when global firewall is enabled. On the same host, VM with external IP 194.x.x.x lost connection to both of those VMs as soon as global firewall is enabled. ICMP and UDP works fine, only TCP breaks down. I've just upgraded to Proxmox VE 3.4-3 - the problem remains.
 
Hi,

we have the same problem. When enabling the Firewall tcp connections to VM's in different networks on same physical Host don't work anymore.

Traffic goes that way: VM1 in VLAN x -> external Firewall/Router -> VM2 in VLAN y (both VM's on same physical Host)
TCP SYN is seen on VM2, SYN/ACK is only seen on pyhsical Host but not on VM1.
 
any plans to fix it? currently the firewall feature has to be deactivated in big installations with external routers or firewalls. so the firewall feature is currently useless. :(
 
I'm not sure what to make of it. I could only create a virtual testing scenario (obviously then lacking external physical switches and routers). And in each case communication succeeded perfectly fine regardelss of the firewall settings.
Seems weird that all traffic but TCP SYN/ACK would be suddenly blocked. OTOH It seems weird that anything gets through at all, as from the default firewall rules the packets should end up hitting PVEFW-HOTS-IN's final `-j DROP`.
OTOH vlans act a little different there. There are a few sysctls to take into account (net.bridge.bridge-nf-call-iptables, net.bridge.bridge-nf-filter-vlan-tagged, ...). And that matching on vlans directly is ebtables' task, not iptables'.
Note however that PVEFW-Drop contains a rule: `-m tcp ! --tcp-flags FIN,SYN,RST,ACK SYN -j DROP`, which basically means DROP any packet that isn't a SYN packet. HOWEVER, before reaching this chain, the previous chain (PVEFW-HOST-IN has a rule `-m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT` which SHOULD match the SYN/ACK packet...
That is, provided the initial SYN packet gets through in the first place. I don't immediately see how that would happen though. (or the ICMP or UDP traffic...)

The only thing I DID encounter here was that apparently VLAN support in general seems buggy in the old 2.6.32 kernel series, and it seems to even depend on the network card used. IE If for my VMHoster (described below) I used virtio for the network cards, VLAN traffic was generally routed to the wrong interfaces. (went from the physical one to the bridge it was part of, instead of the tagged vlan interfaces.) Switching to E1000 made it work.

Anyway, here's the setup I tried: (Abbreviating the start of IP addresses 192.168 with "..")

Code:
Main machine
|               +-----------------------------------------+
+==============>| Router                                  |  Routing table:
|   ________    |   ___________                           |  ..99.0/24 => eth0
|  /vmbr0   \   |  / eth0      \                          |  ..254.0/24 => eth1
+->|..99.10 <---+-->  ..99.225 |                          |  ..253.0/24 => eth2.10
|  \--------/   |  \-----------/                          |  ..252.0/24 => eth2.11
|               |       X                                 |
|               |   ____X_____     ________     ________  |
|               |  / eth1     \   |eth2.10 |   |eth2.11 | |
|               |  | ..254.1  |   |..253.1 |   |..252.1 | |
|               |  |          |   +----A---+   +---A----+ |
|               |  \----A-----/        |           |      |
|               |       |              | (tagging) |      |
|               |       |              |           |      |
|               |       |          +---v-----------v---+  |
|               |       |          | eth2 (no ip)      |  |
|               |       |          \---------A---------/  |
|               |       |                    |            |
|               +-------|--------------------|------------+
|                       |                    |
|                  +----v---+                |
+----------------->|vmbr0v5 |                |
|                  +----A---+                |
|                       |                    |
|                       |               +----v----+
+------------------ - - | - - --------->|vmbr0v6  |
|                       |               +----A----+
|                       |                    |
|     +-----------------|--------------------|---------------------+
+====>| VM Hoster       |                    |                     | Default routing table
      | (nested) +------v------+   +---------v------------+        | Default firewall setup
      |          |eth0         |   |eth1 (no ip)          |        |
      |          |..254.100    |   +--A----------------A--+        |
      |          +-------------+      |                |           |
      |                               |    (tagging)   |           |
      |                               |                |           |
      |                            +--v-----+    +-----v--+        |
      |                            |eth1.10 |    |eth1.11 |        |
      |                            +--X-----+    +-----X--+        |
      |                               |                |           |
      |                           +---X-----+    +-----X---+       |
      |                           |vmbr1v10 |    |vmbr1v11 |       |
      |                           +---X-----+    +-----X---+       |
      |                               |                |           |
      |                        +------X-+            +-X--- --+    |
      |                        |tap100i0|            |tap101i0|    |
      |                        +--A-----+            +-----A--+    |
      |                           |                        |       |
      |                           |                        |       |
      |                   +-------+----+           +-------+----+  |
      |                   | vm100 |    |           | vm101 |    |  |
      |                   | +-----v---+|           | +-----v---+|  |
      |                   | | eth0    ||           | | eth0    ||  |
      |                   | |..253.10 ||           | |..252.10 ||  |
      |                   | +---------+|           | +---------+|  |
      |                   +------------+           +------------+  |
      |                                                            |
      +------------------------------------------------------------+

Help me correct the scenario if it doesn't reflect the situation enough.

This is the route a packet takes from VM Hoster's vm100 to vm101:
vm100 writes to eth0, eth0 sends it off to tap100i0, bridged over vmbr1v10 to eth1.10 where it is tagged as vlan id 10. Up to this point `tcpdump -XX` shows no tag in the ethernet frame.
The tagging happens now right before eth1.10 hands the packet over to eth1, where `tcpdump -XX` shows `<dstmac> <srcmac> 8100 000a` in the ethernet frame, the 802.1q vlan tag 10.
Then eth1 sends it over the emulated physical device (the *real* host's vmbr0v6) to the router-VM. vmbr0v6 also successfully shows a tagged packet, as does tcpdump on the router on eth2.
In order for the router to care about linking VLANs together I there have to make it take off the vlan tag (otherwise no forwarding whatsoever happens.)
So the router is on the same network with the same tag via eth2.10 as 192.168.253.1.
So the packet goes from the router's eth2 to eth2.10 and in the process loses the VLAN tag. tcpdump on eth2.10 shows the regular untagged packet asking to be sent from 192.168.253.10 to 192.168.252.10.
The router now sees that ..252.0/24 is to be routed over eth2.11 and writes the packet to that interface. There tcpdump also shows the untagged packet.
Now the packet is tagged as belonging to vlan 11 and moves over to eth2 where it originally came from, this time with the new vlan=11 tag. (Visible via `tcpdump -XX` as `8100 000b`).
Now back over vmbr0v6 (which correctly shows the vlan=11 tagged packet) it reaches the VMHost's eth1, which forwards it to eth1.11, dropping the tag in the process.
In eth1.11 now tcpdump shows the packet without vlan tag. This interface is bridged over vmbr1v11 to tap101i0 to vm101's eth0 which now happily receives a packet originating from 192.168.253.10 to be delivered to vm101 on 192.168.252.10.

What a joyride...

Now then. If I activate the VMHost's firewall I can still ping and netcat/tcp between vm100 and vm101. Which is a BIT puzzling though...

Code:
net.bridge.bridge-nf-filter-vlan-tagged=1 (0 or 1 made no difference)
net.bridge.bridge-nf-call-iptables=1      (was 1 by default)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!