EVPN+VPLS with multi exit nodes : firewall drop packet with asymetric routing

adofou

Member
Mar 14, 2020
9
1
23
34
Hello,

I've been trying for weeks to set up a 3-node cluster with EVPN+VXLAN between the nodes, and a BGP controller that announces unicast prefixes to my network.
Each node has its own BGP controller and announces EVPN subnets. This now works fine, with a few filters at the entrance to my network.
On paper, traffic must exit at the nearest connection with the network, the local host, and ingress from internet as close as possible to the cluster from the network point of view, aka the node the near from ingress of packet into our network (even if it has to circulate in a VXLAN between two nodes).

The problem is that I've noticed some strange problems with VMs in this EVPN. Some destinations were unreachable (for example, a debian mirror).
And depending on the host whre is the VM, they weren't the same. One destination don't working on host, can working when VM was moved to another host.

The problem magically disappeared when I configured a node as a “Primary Exit Node”. No matter which node you select.
I've set all the uRPF check values as accurately as possible. Everything is disabled both from the network point of view, and on the Proxmox machines (I had missed this point, I find that the note on this subject is a little low, because only in example much lower than the help).
Code:
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.all.rp_filter=0

So it's not a VXLAN tunnel or EVPN announcement issue, since it works when a “Primary Exit Node” is configured. And through the right tunnels.
By the way, all BGP sessions are UP. And VMs on the 3 node can ping other VMs without worry.

So I continued debugging and realized that the problem only appeared if the routing was asymmetrical.
In other words, if the packet egress and ingress the same node from the network point of view, it worked.
But as soon as the packet egress the node locally (no Primary exit node), and ingress another node because of BGP routing (and therefore had to pass through EVPN+VXLAN), it no longer worked.

If output and input occur at the same node
Code:
EGRESS ICMP REQUEST :
01:47:49.054950 veth100i0 P   IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.054975 fwln100i0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.054976 fwpr100p0 P   IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.054976 evpn  In  IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.055003 vmbr0.3 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.055005 vmbr0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.055012 ens10f0np0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
INGRESS REPLY :
01:47:49.055497 ens10f0np0 In  IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055497 vmbr0 In  IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055497 vmbr0.3 In  IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055525 evpn  Out IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055530 fwpr100p0 Out IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055532 fwln100i0 P   IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055544 veth100i0 Out IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
Note : It's from CT, but same from a VM

If, due to network routing, egress and ingress are not via the same node
Code:
Node 2 with VM, egress ICMP :
01:51:15.462128 tap105i0 P   IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462140 fwln105i0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462144 fwpr105p0 P   IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462147 evpn  In  IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462157 vmbr0.3 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462158 vmbr0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462162 ens10f0np0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64

Node 1 - ingress (due to BGP perspective) :
01:51:15.463390 ens10f0np0 In  IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 579, seq 1, length 64
01:51:15.463390 vmbr0 In  IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 579, seq 1, length 64
01:51:15.463390 vmbr0.3 In  IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 579, seq 1, length 64
Note : It's from VM, but same from a CT
We can see that, incomprehensibly, the package fits but stops at vmbr0.3 instead of vrfbr_evpn (the VRF bridge for EVPN+VXLAN)
However, the routing has not changed, the route is still the same and is present in the routing table.
213.152.X.X nhid 353 via 172.31.255.5 dev vrfbr_evpn proto bgp src 213.152.Y.Y metric 20 onlink

Worse still, during my troubleshooting, I noticed behavior that didn't make sense at the time.
A ping to 8.8.8.8 didn't work, but a DNS query did!
Code:
Node 2 with VM, egress DNS request :
02:04:01.597952 tap105i0 P   IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597964 fwln105i0 Out IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597968 fwpr105p0 P   IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597971 evpn  In  IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597979 vmbr0.3 Out IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597980 vmbr0 Out IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597983 ens10f0np0 Out IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)

02:04:01.603899 vrfvx_evpn In  IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54) <- VXLAN from Node 1
02:04:01.603902 vrfbr_evpn In  IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.603907 evpn  Out IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.603909 fwpr105p0 Out IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.603911 fwln105i0 P   IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.603913 tap105i0 Out IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)

02:04:01.608743 tap105i0 P   IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608752 fwln105i0 Out IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608754 fwpr105p0 P   IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608756 evpn  In  IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608763 vmbr0.3 Out IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608764 vmbr0 Out IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608767 ens10f0np0 Out IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)

02:04:01.613895 vrfvx_evpn In  IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)  <- VXLAN from Node 1
02:04:01.613901 vrfbr_evpn In  IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.613915 evpn  Out IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.613921 fwpr105p0 Out IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.613925 fwln105i0 P   IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.613933 tap105i0 Out IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)


Node 1 - ingress (due to BGP perspective) :
02:04:01.604374 ens10f0np0 In  IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.604374 vmbr0 In  IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.604374 vmbr0.3 In  IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.604407 vrfbr_evpn Out IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.604413 vrfvx_evpn Out IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)  -> To VXLAN

02:04:01.614347 ens10f0np0 In  IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.614347 vmbr0 In  IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.614347 vmbr0.3 In  IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.614379 vrfbr_evpn Out IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.614384 vrfvx_evpn Out IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111) -> To VXLAN

I really racked my brains for a long time, then at some point I remembered that I had activated the firewall on the cluster.
So I started looking at this, and there are a lot of basic rules on the host that I don't necessarily understand.
So I decided to simply turn off the firewall on the cluster and ...see

And the magic works!

<capture node1-2 firewall.txt attachment>

In other words, as soon as I activate the cluster's firewall, probably because of a rule or the conntrack, it completely breaks my asymmetrical incoming traffic.
The firewall should trigger a rule and delete the packet, instead of transferring it to vrfbr_evpn.

Does anyone have a solution to this problem?

Johann
 

Attachments

  • controllers.txt
    747 bytes · Views: 1
  • vnets-zones-subnets.txt
    411 bytes · Views: 0
  • firewall.txt
    152 bytes · Views: 2
  • capture node1-2 firewall.txt
    7.3 KB · Views: 2
Last edited:
what is your firewall rules between your hosts (firewall at host level) ? you need to have bgp && vxlan port open between hosts (in|out), or it'll drop traffic.

I'm not aware of a reported bug with asymetric routing && firewall.

but maybe you can try to add:

nf_conntrack_allow_invalid: 1

in /etc/pve/nodes/<nodename>/host.fw options.



(Personnaly , I don't use firewall on my hosts where I use evpn, as the evpn is running is a different vrf, the vms can't reach host ssh or other openeded port)
 
what is your firewall rules between your hosts (firewall at host level) ? you need to have bgp && vxlan port open between hosts (in|out), or it'll drop traffic.
No personal rules, I have just activate the firewall during configuration. No VM/CT rules either.
In attachment the current iptables rules (generated when I activate the firewall on cluster), network files and ip routes.
I put only NODE1 and NODE2, but I can provided the files for NODE3 if needed.

We use two loopback lo and lo:0.
Both permit me to test EVPN+VXLAN via publique or private vlan.
The goal is to create redundancy between the two network ports. Because connected to two different network equipments. Only the private part was kept during debugging. We advertise this loopback via BGP (via addition in /etc/frr/frr.conf.local) and setup the VXLAN over this tunnel.
I thought this might be the problem, so I rollback the VXLAN tunnels directly to the /31 IPs in the private VLAN (connected to each other by an L3VPN + a static route, BGP to come, I also have strange problems when accept this. So only the loopback for the moment. First things first).
But that didn't change the problem.

I don't see loopbacks in the firewall rules. But I still have a problem with direct interco IPs, which seem to be present in PVEFW-HOST-IN && PVEFW-HOST-OUT.

I'm not aware of a reported bug with asymetric routing && firewall.

but maybe you can try to add:

nf_conntrack_allow_invalid: 1

in /etc/pve/nodes/<nodename>/host.fw options.

We I adde this on the node1, the issues stopped and routing working.
This seems to confirm a problem with the conntrack on the NODE 1 (which has ingress traffic, but not egress).
The question is why, if you don't aware about a potential bug :-/

Stupid question : I have try to disable firewall on the node only, but seems be do nothing (always see rules in iptables).
So what is the goal of this option?


(Personnaly , I don't use firewall on my hosts where I use evpn, as the evpn is running is a different vrf, the vms can't reach host ssh or other openeded port)
In fact, I want to be able to put certain firewall settings directly on the hypervisor side, which would not be touchable from the VM (from end users). This is a requirement of our security team.
If I deactive firewall at cluster, that's disable this feature.
But if I deactivate firewall on node, that's the same. Or only disable firewalling on "host trafic"?

Many thanks!
 

Attachments

  • node1-firewall.txt
    10.7 KB · Views: 1
  • node1-networks-routes.txt
    2.7 KB · Views: 1
  • node2-firewall.txt
    12.8 KB · Views: 1
  • node2-networks-routes.txt
    2.6 KB · Views: 1
In fact, I want to be able to put certain firewall settings directly on the hypervisor side, which would not be touchable from the VM (from end users). This is a requirement of our security team.
If I deactive firewall at cluster, that's disable this feature.
But if I deactivate firewall on node, that's the same. Or only disable firewalling on "host trafic"?

Many thanks!
mmm, disabling the firewall at host level, indeed only remove the host rules. But the contrackk is still here (because it's shared between all vms && host ). (They are a default rule on top of all other, looking in the conntrack for already established connection).

-A PVEFW-FORWARD -m conntrack --ctstate INVALID -j DROP #this rule is removed with nf_conntrack_allow_invalid
-A PVEFW-FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

I'll add a note in the doc about nf_conntrack_allow_invalid. With asymetric routing, it's quite possible that traffic is going out from a node, and coming back to another node where the conntrack was not opened (so the invalid rule is dropping packet).
I don't think we can do something about this (we don't have any conntrack sync between hosts)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!