[SOLVED] PVE 7 Cluster - Trouble with pve-firewall for VMs only on the same node

Jesster

Active Member
Mar 19, 2018
14
4
43
Hey All,

I have a cluster of 6 servers (pve-manager/7.1-12/b3c09de3 (running kernel: 5.13.19-6-pve))

All servers are the same hardware. I have this really bizarre issue that smells like MTU, but I have had no luck working on a solution. This is not your typical network problem.

I have about 12 VMs on the cluster. These VMs are on various VLANs provided by some Edgecore 40/100G switches.

The issue I'm having is when any two VMs belonging to two different public VLANs try to establish any TCP connection to each other while running on the same PVE Node. I can migrate any VM to any PVE node and the issue no longer occurs. ICMP/UDP traffic is fine to these VMs. If I have two VMs on the same VLAN, on the same PVE node, there are no problems. You heard that right. Also, all Virtual Machines can use "The Internet" without any issue. For each of these VLANs, we use a Cisco Router as the L3 gateway. Traffic must leave the PVE node from the source bridge, then hit the external switches, reach the Cisco gateway, then return back to the PVE Node over a different physical connection and destination bridge.


I tried the following troubleshooting:
  1. Disabled firewall for VM, Node, Datacenter
  2. Rebooted PVE Nodes
  3. Lowered PVE Node MTU on physical, bond, and bridge interfaces (then power cycled VMs) 1350
  4. Lowered Virtual Machine MTU 1250
  5. Observed TCP Retransmissions from tcpdump packet captures
  6. Tried different guest OS (Using Oracle Linux 8 mostly)
  7. Verified guest OS does not run firewall
  8. Tried several TCP connections (ssh, curl, mysql, etc.) - connection is created but no data is ever read
  9. Disabled GSO via ethtool on every physical/bond/vlan/bridge interface I could find.


To be clear, the issue only occurs when traffic between two Virtual Machines crosses two different bridges on the same PVE Node. I can migrate a VM to any PVE Node to demonstrate the issue no longer occurs. I also worked with my network engineering group to verify switch interface stats, MTU, etc. and played with several settings to see if we could fix this.


side note: I have a PVE6.x cluster connected to the same switches and VLANs without this issue. Granted, the server hardware is not the same.






Basic Networking Info:


Code:
auto lo
    iface lo inet loopback

    auto eno3
    iface eno3 inet manual
            up /sbin/ip link set $IFACE promisc on

    iface eno4 inet manual

    iface eno1 inet manual
    iface eno2 inet manual

    iface enp129s0f0 inet manual
    iface enp129s0f1 inet manual

    iface enp130s0f0 inet manual
    iface enp130s0f1 inet manual

    auto bond0
    iface bond0 inet manual
            bond-slaves eno1 enp130s0f1
            bond-mode active-backup

    auto bond1
    iface bond1 inet manual
            bond-slaves enp129s0f1 eno2
            bond-mode active-backup

    auto bond2
    iface bond2 inet manual
            bond-slaves enp130s0f0 enp129s0f0
            bond-mode active-backup

    auto bond1.55
    iface bond1.55 inet manual
            vlan-raw-device bond1

    auto bond1.67
    iface bond1.67 inet manual
            vlan-raw-device bond1

    auto bond1.68
    iface bond1.68 inet manual
            vlan-raw-device bond1

    auto bond2.63
    iface bond2.63 inet manual
            vlan-raw-device bond2

    auto bond2.64
    iface bond2.64 inet manual
            vlan-raw-device bond2

    auto bond2.65
    iface bond2.65 inet manual
            vlan-raw-device bond2

    auto vmbr55
    iface vmbr55 inet manual
            bridge-ports bond1.55
            bridge-stp on
            bridge-fd 0
    #VLAN55

    auto vmbr67
    iface vmbr67 inet manual
            bridge-ports bond1.67
            bridge-stp on
            bridge-fd 0
    #VLAN67

    auto vmbr68
    iface vmbr68 inet manual
            bridge-ports bond1.68
            bridge-stp on
            bridge-fd 0
    #VLAN68

    auto vmbr63
    iface vmbr63 inet manual
            bridge-ports bond2.63
            bridge-stp on
            bridge-fd 0
    #VLAN63

    auto vmbr64
    iface vmbr64 inet manual
            bridge-ports bond2.64
            bridge-stp on
            bridge-fd 0
    #VLAN64

    auto vmbr65
    iface vmbr65 inet manual
            bridge-ports bond2.65
            bridge-stp on
            bridge-fd 0
    #VLAN65

    auto vmbr66
    iface vmbr66 inet static
            address x.x.x.x/27
            gateway x.x.x.y
            bridge-vlan-aware no
            bridge-ports bond0
            bridge-stp on
            bridge-fd 0
    #VLAN66-MGMT-INSIDE

    auto vmbr9999
    iface vmbr9999 inet manual
            bridge-ports eno3
            bridge-stp off
            bridge-fd 0
            bridge-vlan-aware no
            up /usr/sbin/brctl setageing vmbr9999 0
            up /usr/sbin/brctl setfd vmbr9999 0
    #SPAN-PORT-ENO3-BORDER0-RTR

    auto vmbr9009
    iface vmbr9009 inet static
            address 10.9.0.1/24
            bridge-ports none
            bridge-stp on
            bridge-fd 0
            bridge-vlan-aware no
    #INT-9009-TEST

    auto vmbr9010
    iface vmbr9010 inet static
            address 10.10.0.1/24
            bridge-ports none
            bridge-stp on
            bridge-fd 0
            bridge-vlan-aware no
    #INT-9010-TEST



pveversion -v proxmox-ve: 7.1-1 (running kernel: 5.13.19-6-pve) pve-manager: 7.1-12 (running version: 7.1-12/b3c09de3) pve-kernel-helper: 7.1-14 pve-kernel-5.13: 7.1-9 pve-kernel-5.13.19-6-pve: 5.13.19-15 pve-kernel-5.13.19-2-pve: 5.13.19-4 ceph-fuse: 15.2.15-pve1 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve2 libproxmox-acme-perl: 1.4.1 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.1-7 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.1-5 libpve-guest-common-perl: 4.1-1 libpve-http-server-perl: 4.1-1 libpve-storage-perl: 7.1-2 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.12-1 lxcfs: 4.0.12-pve1 novnc-pve: 1.3.0-2 openvswitch-switch: 2.15.0+ds1-2+deb11u1 proxmox-backup-client: 2.1.6-1 proxmox-backup-file-restore: 2.1.6-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.4-9 pve-cluster: 7.1-3 pve-container: 4.1-4 pve-docs: 7.1-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.3-6 pve-ha-manager: 3.3-3 pve-i18n: 2.6-2 pve-qemu-kvm: 6.2.0-3 pve-xtermjs: 4.16.0-1 qemu-server: 7.1-4 smartmontools: 7.2-1 spiceterm: 3.2-2 swtpm: 0.7.1~bpo11+1 vncterm: 1.7-1 zfsutils-linux: 2.1.4-pve1


Code:
lshw -c network -businfo
    Bus info          Device           Class          Description
    =============================================================
    pci@0000:01:00.0  eno1             network        82599ES 10-Gigabit SFI/SFP+ Network Conn
    pci@0000:01:00.1  eno2             network        82599ES 10-Gigabit SFI/SFP+ Network Conn
    pci@0000:06:00.0  eno3             network        I350 Gigabit Network Connection
    pci@0000:06:00.1  eno4             network        I350 Gigabit Network Connection
    pci@0000:81:00.0  enp129s0f0       network        Ethernet 10G 2P X520 Adapter
    pci@0000:81:00.1  enp129s0f1       network        Ethernet 10G 2P X520 Adapter
    pci@0000:82:00.0  enp130s0f0       network        Ethernet 10G 2P X520 Adapter
    pci@0000:82:00.1  enp130s0f1       network        Ethernet 10G 2P X520 Adapter



Code:
ethtool -i eno1
    driver: ixgbe
    version: 5.14.6
    firmware-version: 0x8000095c, 19.5.12
    expansion-rom-version:
    bus-info: 0000:01:00.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: yes
    supports-register-dump: yes
    supports-priv-flags: yes

ethtool -i enp129s0f0
    driver: ixgbe
    version: 5.14.6
    firmware-version: 0x8000095d, 19.5.12
    expansion-rom-version:
    bus-info: 0000:81:00.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: yes
    supports-register-dump: yes
    supports-priv-flags: yes

ethtool -i enp130s0f0
    driver: ixgbe
    version: 5.14.6
    firmware-version: 0x8000095d, 19.5.12
    expansion-rom-version:
    bus-info: 0000:82:00.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: yes
    supports-register-dump: yes
    supports-priv-flags: yes
 
  • Like
Reactions: cfrietschy
After some more head scratching, it turns out I think this is a pve-firewall issue.

It's got me baffled. I use this same ruleset on 50+ PVE nodes, but it is not working on our PVE 7x cluster. I'm wondering if there is something different about pve-firewall that is breaking this.

The cluster.fw firewall specifies a security group that allows my other servers and management networks access to all. This is enabled for the entire DataCenter level. Within each VM, there are also IPSets (defined in cluster.fw) that get applied as necessary plus my management group.

Ok so here's where I'm confused.
If I disable the firewall on the NIC for "server1", I will then successfully establish any TCP connection to "server2". You heard that right, I have to disable the firewall on the source Virtual Machine nic. If I migrate "server1" to a separate PVE Node, I can have the firewall enabled and I am still able to reach "server2".
I observed the IPTables chain counters increment for the proper rules (iptables -nvL tap202204173i0-IN --line-numbers), and checked 'pve-firewall compile' for any warnings. Also no luck: pve-firewall restart.

So in general, I have something like:
Code:
cluster.fw:

[OPTIONS]

enable: 1
ebtables: 1


[ALIASES]

my-servers x.x.x.x/24

[IPSET management]

my-servers

[group mgmt-servers]

IN ACCEPT -source +my-servers -log nolog


[RULES]

GROUP mgmt-servers


Code:
"server1".fw
[OPTIONS]

policy_in: DROP
enable: 1
ipfilter: 0
radv: 0
policy_out: ACCEPT
macfilter: 1

[RULES]

GROUP mgmt-servers


Code:
"server2".fw
[OPTIONS]

policy_in: DROP
enable: 1
ipfilter: 0
radv: 0
policy_out: ACCEPT
macfilter: 1

[RULES]

GROUP mgmt-servers


pve-firewall compile:

Code:
"server1" when I actually enable the firewall via nic:

# pve-firewall compile | grep  tap202204140i0
        -A PVEFW-FWBR-IN -m physdev --physdev-is-bridged --physdev-out tap202204140i0 -j tap202204140i0-IN
        -A PVEFW-FWBR-OUT -m physdev --physdev-is-bridged --physdev-in tap202204140i0 -j tap202204140i0-OUT
exists tap202204140i0-IN (fBkXxZNX8yUzIemIJZWjC0qtPtY)
        -A tap202204140i0-IN -p udp --sport 67 --dport 68 -j ACCEPT
        -A tap202204140i0-IN -j GROUP-mgmt-servers-IN
        -A tap202204140i0-IN -m mark --mark 0x80000000/0x80000000 -j ACCEPT
        -A tap202204140i0-IN -j PVEFW-Drop
        -A tap202204140i0-IN -j DROP
exists tap202204140i0-OUT (FQuebwi264i5VEKNcTtBNiD5RH0)
        -A tap202204140i0-OUT -p udp --sport 68 --dport 67 -g PVEFW-SET-ACCEPT-MARK
        -A tap202204140i0-OUT -m mac ! --mac-source 36:8E:56:C9:A3:01 -j DROP
        -A tap202204140i0-OUT -j MARK --set-mark 0x00000000/0x80000000
        -A tap202204140i0-OUT -j GROUP-mgmt-servers-OUT
        -A tap202204140i0-OUT -m mark --mark 0x80000000/0x80000000 -j RETURN
        -A tap202204140i0-OUT  -g PVEFW-SET-ACCEPT-MARK
        -A PVEFW-FWBR-IN -m physdev --physdev-is-bridged --physdev-out tap202204140i0 -j tap202204140i0-IN
        -A PVEFW-FWBR-OUT -m physdev --physdev-is-bridged --physdev-in tap202204140i0 -j tap202204140i0-OUT
exists tap202204140i0-IN (RqiY5vH+1dRRyHnQyBUUrN3zmIk)
        -A tap202204140i0-IN -p udp --sport 547 --dport 546 -j ACCEPT
        -A tap202204140i0-IN -p icmpv6 --icmpv6-type router-solicitation -j ACCEPT
        -A tap202204140i0-IN -p icmpv6 --icmpv6-type router-advertisement -j ACCEPT
        -A tap202204140i0-IN -p icmpv6 --icmpv6-type neighbor-solicitation -j ACCEPT
        -A tap202204140i0-IN -p icmpv6 --icmpv6-type neighbor-advertisement -j ACCEPT
        -A tap202204140i0-IN -j GROUP-mgmt-servers-IN
        -A tap202204140i0-IN -m mark --mark 0x80000000/0x80000000 -j ACCEPT
        -A tap202204140i0-IN -j PVEFW-Drop
        -A tap202204140i0-IN -j DROP
exists tap202204140i0-OUT (D99RR4uTp+eL9wB3+7kO7C8iWQ4)
        -A tap202204140i0-OUT -p udp --sport 546 --dport 547 -g PVEFW-SET-ACCEPT-MARK
        -A tap202204140i0-OUT -m mac ! --mac-source 36:8E:56:C9:A3:01 -j DROP
        -A tap202204140i0-OUT -p icmpv6 --icmpv6-type router-advertisement -j DROP
        -A tap202204140i0-OUT -j MARK --set-mark 0x00000000/0x80000000
        -A tap202204140i0-OUT -p icmpv6 --icmpv6-type router-solicitation -g PVEFW-SET-ACCEPT-MARK
        -A tap202204140i0-OUT -p icmpv6 --icmpv6-type neighbor-solicitation -g PVEFW-SET-ACCEPT-MARK
        -A tap202204140i0-OUT -p icmpv6 --icmpv6-type neighbor-advertisement -g PVEFW-SET-ACCEPT-MARK
        -A tap202204140i0-OUT -j GROUP-mgmt-servers-OUT
        -A tap202204140i0-OUT -m mark --mark 0x80000000/0x80000000 -j RETURN
        -A tap202204140i0-OUT  -g PVEFW-SET-ACCEPT-MARK

For "server2", it has a running firewall that I am able to ssh to it from "server1" when "server1" has it's firewall off.


Code:
"server2"
# pve-firewall compile | grep  tap202204173i0
        -A PVEFW-FWBR-IN -m physdev --physdev-is-bridged --physdev-out tap202204173i0 -j tap202204173i0-IN
        -A PVEFW-FWBR-OUT -m physdev --physdev-is-bridged --physdev-in tap202204173i0 -j tap202204173i0-OUT
exists tap202204173i0-IN (Lk6TepzgL12fPtBvq2NYnx9tySU)
        -A tap202204173i0-IN -p udp --sport 67 --dport 68 -j ACCEPT
        -A tap202204173i0-IN -j GROUP-mgmt-servers-IN
        -A tap202204173i0-IN -m mark --mark 0x80000000/0x80000000 -j ACCEPT
        -A tap202204173i0-IN -j PVEFW-Drop
        -A tap202204173i0-IN -j DROP
exists tap202204173i0-OUT (JIIs2k/HpSHpKEIpbxgUKEwTIAI)
        -A tap202204173i0-OUT -p udp --sport 68 --dport 67 -g PVEFW-SET-ACCEPT-MARK
        -A tap202204173i0-OUT -m mac ! --mac-source BA:E7:9D:FC:BF:B8 -j DROP
        -A tap202204173i0-OUT -m set ! --match-set PVEFW-66B0F9EB src -j DROP
        -A tap202204173i0-OUT -j MARK --set-mark 0x00000000/0x80000000
        -A tap202204173i0-OUT -j GROUP-mgmt-servers-OUT
        -A tap202204173i0-OUT -m mark --mark 0x80000000/0x80000000 -j RETURN
        -A tap202204173i0-OUT  -g PVEFW-SET-ACCEPT-MARK
        -A PVEFW-FWBR-IN -m physdev --physdev-is-bridged --physdev-out tap202204173i0 -j tap202204173i0-IN
        -A PVEFW-FWBR-OUT -m physdev --physdev-is-bridged --physdev-in tap202204173i0 -j tap202204173i0-OUT
exists tap202204173i0-IN (FcVx527JekSBXXnkD4fwVRFFXUw)
        -A tap202204173i0-IN -p udp --sport 547 --dport 546 -j ACCEPT
        -A tap202204173i0-IN -p icmpv6 --icmpv6-type router-solicitation -j ACCEPT
        -A tap202204173i0-IN -p icmpv6 --icmpv6-type router-advertisement -j ACCEPT
        -A tap202204173i0-IN -p icmpv6 --icmpv6-type neighbor-solicitation -j ACCEPT
        -A tap202204173i0-IN -p icmpv6 --icmpv6-type neighbor-advertisement -j ACCEPT
        -A tap202204173i0-IN -j GROUP-mgmt-servers-IN
        -A tap202204173i0-IN -m mark --mark 0x80000000/0x80000000 -j ACCEPT
        -A tap202204173i0-IN -j PVEFW-Drop
        -A tap202204173i0-IN -j DROP
exists tap202204173i0-OUT (Sa9S4oFw6r17sRxpZiCyj1Oo/BM)
        -A tap202204173i0-OUT -p udp --sport 546 --dport 547 -g PVEFW-SET-ACCEPT-MARK
        -A tap202204173i0-OUT -m mac ! --mac-source BA:E7:9D:FC:BF:B8 -j DROP
        -A tap202204173i0-OUT -p icmpv6 --icmpv6-type router-advertisement -j DROP
        -A tap202204173i0-OUT -m set ! --match-set PVEFW-68B0FD11 src -j DROP
        -A tap202204173i0-OUT -j MARK --set-mark 0x00000000/0x80000000
        -A tap202204173i0-OUT -p icmpv6 --icmpv6-type router-solicitation -g PVEFW-SET-ACCEPT-MARK
        -A tap202204173i0-OUT -p icmpv6 --icmpv6-type neighbor-solicitation -g PVEFW-SET-ACCEPT-MARK
        -A tap202204173i0-OUT -p icmpv6 --icmpv6-type neighbor-advertisement -g PVEFW-SET-ACCEPT-MARK
        -A tap202204173i0-OUT -j GROUP-mgmt-servers-OUT
        -A tap202204173i0-OUT -m mark --mark 0x80000000/0x80000000 -j RETURN
        -A tap202204173i0-OUT  -g PVEFW-SET-ACCEPT-MARK

So I start to look at the chain directly:


Code:
"server1" when I actually enable the firewall via nic:

# iptables -nvL tap202204140i0-IN  --line-numbers
Chain tap202204140i0-IN (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp spt:67 dpt:68
2        4   278 GROUP-mgmt-servers-IN  all  --  *      *       0.0.0.0/0            0.0.0.0/0
3        4   278 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match 0x80000000/0x80000000
4        0     0 PVEFW-Drop  all  --  *      *       0.0.0.0/0            0.0.0.0/0
5        0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0
6        0     0            all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* PVESIG:fBkXxZNX8yUzIemIJZWjC0qtPtY */

Code:
"server2"

# iptables -nvL tap202204173i0-IN  --line-numbers
Chain tap202204173i0-IN (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp spt:67 dpt:68
2       85  5456 GROUP-mgmt-servers-IN  all  --  *      *       0.0.0.0/0            0.0.0.0/0
3       84  5420 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match 0x80000000/0x80000000
4        1    36 PVEFW-Drop  all  --  *      *       0.0.0.0/0            0.0.0.0/0
5        0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0
6        0     0            all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* PVESIG:Lk6TepzgL12fPtBvq2NYnx9tySU */



When "server1" tries to ssh to "server2", I can see the "server2" counters increase:

Code:
# iptables -nvL tap202204173i0-IN  --line-numbers
Chain tap202204173i0-IN (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp spt:67 dpt:68
2      579 37634 GROUP-mgmt-servers-IN  all  --  *      *       0.0.0.0/0            0.0.0.0/0
3      577 37562 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match 0x80000000/0x80000000
4        2    72 PVEFW-Drop  all  --  *      *       0.0.0.0/0            0.0.0.0/0
5        0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0
6        0     0            all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* PVESIG:Lk6TepzgL12fPtBvq2NYnx9tySU */


# iptables -nvL tap202204173i0-IN  --line-numbers
Chain tap202204173i0-IN (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp spt:67 dpt:68
2      614 39831 GROUP-mgmt-servers-IN  all  --  *      *       0.0.0.0/0            0.0.0.0/0
3      612 39759 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match 0x80000000/0x80000000
4        2    72 PVEFW-Drop  all  --  *      *       0.0.0.0/0            0.0.0.0/0
5        0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0
6        0     0            all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* PVESIG:Lk6TepzgL12fPtBvq2NYnx9tySU */


So yeah, as long as "server1" and "server2" are on the same PVE Node and I have the firewall enabled on both of these VMs, ICMP/UDP works, but any TCP session does not.
If I migrate either "server1" or "server2" to any other PVE Node in the cluster, so that these VMs are no longer together on the same PVE Node, things work great!
 
is your cisco gateway only a router ? or also a firewall ? (I known that cisco firewall have an default option which randomize tcp sequences, which could invalid packets).

Hey @spirit ! (sidenote: I was just reading the SDN documentation, I'm really excited to try it out!)

The Cisco router is just routing, no ACLs in use for any of these networks.
I do have a Cisco ASA for one of my VLANs, and it's being phased out this quarter. Perhaps the ASA was changing the TCP Sequences and conntrack wasn't cool with that.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!