[SOLVED] Proxmox cluster + LACP + VLANs + virtual gateway (pfSense) = no ping to gateway from 2 of 3 hosts

skierka

New Member
Feb 6, 2022
2
0
1
46
Hello,

I have set-up a small homelab with three identical proxmox nodes working in the cluster. All went well until I decided to have gateway/firewall (pfSense) as VM.

Problem:
2 out of 3 nodes cannot access WAN (internet). When I ping from host gateway IP (VM pfsense running on one of the nodes) I can see that ping is out and response is back but somehow it is discarded.
However on one node all works fine but I cannot figure out why.
Please note that it does not that migrating pfSense VM from one node to another, restarting nodes does not change situation... it is stable.

Set-up description:
Each node is connected to L2 managed switch with 2 x NICs using LACP (802.3ad).
Each port group has tagged ports for VLAN 5 (WAN), VLAN 15 (LAN), VLAN 20 (IoT)

pfSense VM has three network adapters on top of vmbr1 (LAN), vmbr1 (WAN) and vmbr0 with VLAN tag = 20.

Network configuration (just IP of vmbr1 is different for each node):
Code:
auto lo
iface lo inet loopback

auto enp1s0
iface enp1s0 inet manual

auto enp2s0
iface enp2s0 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves enp1s0 enp2s0
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2
#LAN1+2 link aggregation

auto bond0.15
iface bond0.15 inet manual
#Bond over VLAN 15 (LAN)

auto bond0.5
iface bond0.5 inet manual
#Bond over VLAN 5 (WAN)

auto vmbr1
iface vmbr1 inet static
        address 192.168.15.100/24
        gateway 192.168.15.1
        bridge-ports bond0.15
        bridge-stp off
        bridge-fd 0
#On top of VLAN 15 (LAN)

auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 20-4094
#VLAN 20+

auto vmbr2
iface vmbr2 inet manual
        bridge-ports bond0.5
        bridge-stp off
        bridge-fd 0
#On top of VLAN 5 (WAN)

Problem symptoms:
On node1 (192.168.15.100) I run ping and all packets are lost
Code:
root@pve1:~# ping -c3 192.168.15.1
PING 192.168.15.1 (192.168.15.1) 56(84) bytes of data.

--- 192.168.15.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2029ms

but tcp dump on host (and VM pfsense) shows actual traffic is in/out correctly:
Code:
root@pve1:~# tcpdump -envi vmbr1 icmp
tcpdump: listening on vmbr1, link-type EN10MB (Ethernet), snapshot length 262144 bytes

20:22:37.102368 32:fe:cb:a6:cd:36 > 00:0e:c4:d0:4e:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 37319, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.15.100 > 192.168.15.1: ICMP echo request, id 42394, seq 1, length 64
20:22:37.102668 00:0e:c4:d0:4e:92 > 50:21:08:80:05:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 60139, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.15.1 > 192.168.15.100: ICMP echo reply, id 42394, seq 1, length 64
20:22:38.109750 32:fe:cb:a6:cd:36 > 00:0e:c4:d0:4e:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 37438, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.15.100 > 192.168.15.1: ICMP echo request, id 42394, seq 2, length 64
20:22:38.110046 00:0e:c4:d0:4e:92 > 50:21:08:80:05:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 56162, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.15.1 > 192.168.15.100: ICMP echo reply, id 42394, seq 2, length 64
20:22:39.133743 32:fe:cb:a6:cd:36 > 00:0e:c4:d0:4e:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 37497, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.15.100 > 192.168.15.1: ICMP echo request, id 42394, seq 3, length 64
20:22:39.134029 00:0e:c4:d0:4e:92 > 50:21:08:80:05:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 57476, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.15.1 > 192.168.15.100: ICMP echo reply, id 42394, seq 3, length 64
^C
6 packets captured

On node 2 (192.168.15.99) I run ping and all is fine
Code:
ping -c3 192.168.15.1
PING 192.168.15.1 (192.168.15.1) 56(84) bytes of data.
64 bytes from 192.168.15.1: icmp_seq=1 ttl=64 time=0.336 ms
64 bytes from 192.168.15.1: icmp_seq=2 ttl=64 time=0.505 ms
64 bytes from 192.168.15.1: icmp_seq=3 ttl=64 time=0.458 ms

--- 192.168.15.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2039ms
rtt min/avg/max/mdev = 0.336/0.433/0.505/0.071 ms

ip route on both nodes are the same (as network config) - here example from node1 and node2
Code:
root@pve1:~# ip route
default via 192.168.15.1 dev vmbr1 proto kernel onlink 
192.168.15.0/24 dev vmbr1 proto kernel scope link src 192.168.15.100 

root@pve2:~# ip route
default via 192.168.15.1 dev vmbr1 proto kernel onlink 
192.168.15.0/24 dev vmbr1 proto kernel scope link src 192.168.15.101

Any suggestions?
 
Update - when I look into tcpdump I can see that response is sent back to correct IP but MAC address is not correct.
replay is sent back to 50:21:08:80:05:92 but it should be 32:fe:cb:a6:cd:36
and it comes from pfSense from static ARP map in DHCP... my mistake. But it is good to post & read a problem... sometimes you can find a solution yourself ;-). I have fixed set-up and all works like a charm.

Ps. I have changed the bond definition and MAC address is taken from the 1st NIC.. so for 2 out of 3 it did no longer match static ARP entry on the pfSense side.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!