Hello,
I have set-up a small homelab with three identical proxmox nodes working in the cluster. All went well until I decided to have gateway/firewall (pfSense) as VM.
Problem:
2 out of 3 nodes cannot access WAN (internet). When I ping from host gateway IP (VM pfsense running on one of the nodes) I can see that ping is out and response is back but somehow it is discarded.
However on one node all works fine but I cannot figure out why.
Please note that it does not that migrating pfSense VM from one node to another, restarting nodes does not change situation... it is stable.
Set-up description:
Each node is connected to L2 managed switch with 2 x NICs using LACP (802.3ad).
Each port group has tagged ports for VLAN 5 (WAN), VLAN 15 (LAN), VLAN 20 (IoT)
pfSense VM has three network adapters on top of vmbr1 (LAN), vmbr1 (WAN) and vmbr0 with VLAN tag = 20.
Network configuration (just IP of vmbr1 is different for each node):
Problem symptoms:
On node1 (192.168.15.100) I run ping and all packets are lost
but tcp dump on host (and VM pfsense) shows actual traffic is in/out correctly:
On node 2 (192.168.15.99) I run ping and all is fine
ip route on both nodes are the same (as network config) - here example from node1 and node2
Any suggestions?
I have set-up a small homelab with three identical proxmox nodes working in the cluster. All went well until I decided to have gateway/firewall (pfSense) as VM.
Problem:
2 out of 3 nodes cannot access WAN (internet). When I ping from host gateway IP (VM pfsense running on one of the nodes) I can see that ping is out and response is back but somehow it is discarded.
However on one node all works fine but I cannot figure out why.
Please note that it does not that migrating pfSense VM from one node to another, restarting nodes does not change situation... it is stable.
Set-up description:
Each node is connected to L2 managed switch with 2 x NICs using LACP (802.3ad).
Each port group has tagged ports for VLAN 5 (WAN), VLAN 15 (LAN), VLAN 20 (IoT)
pfSense VM has three network adapters on top of vmbr1 (LAN), vmbr1 (WAN) and vmbr0 with VLAN tag = 20.
Network configuration (just IP of vmbr1 is different for each node):
Code:
auto lo
iface lo inet loopback
auto enp1s0
iface enp1s0 inet manual
auto enp2s0
iface enp2s0 inet manual
auto bond0
iface bond0 inet manual
bond-slaves enp1s0 enp2s0
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2
#LAN1+2 link aggregation
auto bond0.15
iface bond0.15 inet manual
#Bond over VLAN 15 (LAN)
auto bond0.5
iface bond0.5 inet manual
#Bond over VLAN 5 (WAN)
auto vmbr1
iface vmbr1 inet static
address 192.168.15.100/24
gateway 192.168.15.1
bridge-ports bond0.15
bridge-stp off
bridge-fd 0
#On top of VLAN 15 (LAN)
auto vmbr0
iface vmbr0 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 20-4094
#VLAN 20+
auto vmbr2
iface vmbr2 inet manual
bridge-ports bond0.5
bridge-stp off
bridge-fd 0
#On top of VLAN 5 (WAN)
Problem symptoms:
On node1 (192.168.15.100) I run ping and all packets are lost
Code:
root@pve1:~# ping -c3 192.168.15.1
PING 192.168.15.1 (192.168.15.1) 56(84) bytes of data.
--- 192.168.15.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2029ms
but tcp dump on host (and VM pfsense) shows actual traffic is in/out correctly:
Code:
root@pve1:~# tcpdump -envi vmbr1 icmp
tcpdump: listening on vmbr1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:22:37.102368 32:fe:cb:a6:cd:36 > 00:0e:c4:d0:4e:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 37319, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.15.100 > 192.168.15.1: ICMP echo request, id 42394, seq 1, length 64
20:22:37.102668 00:0e:c4:d0:4e:92 > 50:21:08:80:05:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 60139, offset 0, flags [none], proto ICMP (1), length 84)
192.168.15.1 > 192.168.15.100: ICMP echo reply, id 42394, seq 1, length 64
20:22:38.109750 32:fe:cb:a6:cd:36 > 00:0e:c4:d0:4e:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 37438, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.15.100 > 192.168.15.1: ICMP echo request, id 42394, seq 2, length 64
20:22:38.110046 00:0e:c4:d0:4e:92 > 50:21:08:80:05:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 56162, offset 0, flags [none], proto ICMP (1), length 84)
192.168.15.1 > 192.168.15.100: ICMP echo reply, id 42394, seq 2, length 64
20:22:39.133743 32:fe:cb:a6:cd:36 > 00:0e:c4:d0:4e:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 37497, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.15.100 > 192.168.15.1: ICMP echo request, id 42394, seq 3, length 64
20:22:39.134029 00:0e:c4:d0:4e:92 > 50:21:08:80:05:92, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 57476, offset 0, flags [none], proto ICMP (1), length 84)
192.168.15.1 > 192.168.15.100: ICMP echo reply, id 42394, seq 3, length 64
^C
6 packets captured
On node 2 (192.168.15.99) I run ping and all is fine
Code:
ping -c3 192.168.15.1
PING 192.168.15.1 (192.168.15.1) 56(84) bytes of data.
64 bytes from 192.168.15.1: icmp_seq=1 ttl=64 time=0.336 ms
64 bytes from 192.168.15.1: icmp_seq=2 ttl=64 time=0.505 ms
64 bytes from 192.168.15.1: icmp_seq=3 ttl=64 time=0.458 ms
--- 192.168.15.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2039ms
rtt min/avg/max/mdev = 0.336/0.433/0.505/0.071 ms
ip route on both nodes are the same (as network config) - here example from node1 and node2
Code:
root@pve1:~# ip route
default via 192.168.15.1 dev vmbr1 proto kernel onlink
192.168.15.0/24 dev vmbr1 proto kernel scope link src 192.168.15.100
root@pve2:~# ip route
default via 192.168.15.1 dev vmbr1 proto kernel onlink
192.168.15.0/24 dev vmbr1 proto kernel scope link src 192.168.15.101
Any suggestions?