Issue with intervlan comms between VMs on SAME host when firewall is enabled

Jul 2, 2019
23
3
8
Hi,

We're having a strange issue with inter-VLAN comms/networking between VMs on different VLANs on the same host.

We have experienced the same issue on hosts setup with Linux OR OVS bridging - and because of the fact that we segment customer VMs into customer-specific VLANs, we manage VMs instances across an 18-node cluster segregated into a few hundred VLANs (this is the main reason that we elected to start using OVS vs Linux bridges so that the issue of setting up and managing customer VLANs would be simplified and - hopefully - provide us with more flexibility).

Before I go further, I just want to emphasise that this issue is happening on both hosts set up using Linux bridging as well as hosts set up to use OVS bridges - and this seems to be a longstanding issue.

The issue is simply this:

1. When VMs on a Proxmox host have firewall functionality setup and enabled and in the context where those VMs are part of DIFFERENT VLANs (i.e. the VLAN tag for the network interfaces for each VM designates the network interface correspondingly for DIFFERENT VLANs - e.g. VM-1 tagged to VLAN1 and VM-2 tagged to VLAN2 on the SAME host) then it's not possible for those VMs to communicate/ping/connect with each other;
a. When those same VMs are put onto 2 DIFFERENT hosts and firewalls are ENABLED, the problem goes away;
b. When firewalls are DISABLED on those VMs whilst on the SAME host, then the problem goes away (and then comes back when firewalling is re-enabled)

This was initially a bit of a nuisance and our default response was to just shuffle VMs around whilst we tried to figure out what the problem is (we haven't been able to pin down the issue) - but now it's becoming material as we have a sizeable number of VMs and this is becoming a much more frequent problem and occurrence.

We have varying versions of PVE running (we've seen this issue across pretty much all hosts running different PVE versions), but as of this moment, even the most current upgraded host exhibits the same issues.

We think there might be some association with the creation/setup of the tap device that happens when the firewalls are enabled on a VM networking interface e.g.:

7: tap4424i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr4424i0 state UNKNOWN group default qlen 1000
17: fwbr4424i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
18: fwln4424o0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr4424i0 state UNKNOWN

Our setup (on one of our updated hosts) is as follows:

Code:
root@pve-24:~# pveversion --v
proxmox-ve: 6.4-1 (running kernel: 5.4.119-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-4
pve-kernel-helper: 6.4-4
pve-kernel-5.4.124-1-pve: 5.4.124-2
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.1.12-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

OVS setup is as follows (etc/network/interfaces):

Code:
auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
        ovs_options vlan_mode=native-untagged tag=6

auto eno2
iface eno2 inet static
        address 192.168.3.90/24

auto vlan6
iface vlan6 inet static
        address 192.168.6.9/24
        gateway 192.168.6.1
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=6

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports eno1 vlan6

Is anyone experiencing the same types of issues? Any ideas?

Kind regards,

Angelo.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!