Hello,
I've been trying for weeks to set up a 3-node cluster with EVPN+VXLAN between the nodes, and a BGP controller that announces unicast prefixes to my network.
Each node has its own BGP controller and announces EVPN subnets. This now works fine, with a few filters at the entrance to my network.
On paper, traffic must exit at the nearest connection with the network, the local host, and ingress from internet as close as possible to the cluster from the network point of view, aka the node the near from ingress of packet into our network (even if it has to circulate in a VXLAN between two nodes).
The problem is that I've noticed some strange problems with VMs in this EVPN. Some destinations were unreachable (for example, a debian mirror).
And depending on the host whre is the VM, they weren't the same. One destination don't working on host, can working when VM was moved to another host.
The problem magically disappeared when I configured a node as a “Primary Exit Node”. No matter which node you select.
I've set all the uRPF check values as accurately as possible. Everything is disabled both from the network point of view, and on the Proxmox machines (I had missed this point, I find that the note on this subject is a little low, because only in example much lower than the help).
So it's not a VXLAN tunnel or EVPN announcement issue, since it works when a “Primary Exit Node” is configured. And through the right tunnels.
By the way, all BGP sessions are UP. And VMs on the 3 node can ping other VMs without worry.
So I continued debugging and realized that the problem only appeared if the routing was asymmetrical.
In other words, if the packet egress and ingress the same node from the network point of view, it worked.
But as soon as the packet egress the node locally (no Primary exit node), and ingress another node because of BGP routing (and therefore had to pass through EVPN+VXLAN), it no longer worked.
If output and input occur at the same node
Note : It's from CT, but same from a VM
If, due to network routing, egress and ingress are not via the same node
Note : It's from VM, but same from a CT
We can see that, incomprehensibly, the package fits but stops at vmbr0.3 instead of vrfbr_evpn (the VRF bridge for EVPN+VXLAN)
However, the routing has not changed, the route is still the same and is present in the routing table.
Worse still, during my troubleshooting, I noticed behavior that didn't make sense at the time.
A ping to 8.8.8.8 didn't work, but a DNS query did!
I really racked my brains for a long time, then at some point I remembered that I had activated the firewall on the cluster.
So I started looking at this, and there are a lot of basic rules on the host that I don't necessarily understand.
So I decided to simply turn off the firewall on the cluster and ...see
And the magic works!
In other words, as soon as I activate the cluster's firewall, probably because of a rule or the conntrack, it completely breaks my asymmetrical incoming traffic.
The firewall should trigger a rule and delete the packet, instead of transferring it to vrfbr_evpn.
Does anyone have a solution to this problem?
Johann
I've been trying for weeks to set up a 3-node cluster with EVPN+VXLAN between the nodes, and a BGP controller that announces unicast prefixes to my network.
Each node has its own BGP controller and announces EVPN subnets. This now works fine, with a few filters at the entrance to my network.
On paper, traffic must exit at the nearest connection with the network, the local host, and ingress from internet as close as possible to the cluster from the network point of view, aka the node the near from ingress of packet into our network (even if it has to circulate in a VXLAN between two nodes).
The problem is that I've noticed some strange problems with VMs in this EVPN. Some destinations were unreachable (for example, a debian mirror).
And depending on the host whre is the VM, they weren't the same. One destination don't working on host, can working when VM was moved to another host.
The problem magically disappeared when I configured a node as a “Primary Exit Node”. No matter which node you select.
I've set all the uRPF check values as accurately as possible. Everything is disabled both from the network point of view, and on the Proxmox machines (I had missed this point, I find that the note on this subject is a little low, because only in example much lower than the help).
Code:
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.all.rp_filter=0
So it's not a VXLAN tunnel or EVPN announcement issue, since it works when a “Primary Exit Node” is configured. And through the right tunnels.
By the way, all BGP sessions are UP. And VMs on the 3 node can ping other VMs without worry.
So I continued debugging and realized that the problem only appeared if the routing was asymmetrical.
In other words, if the packet egress and ingress the same node from the network point of view, it worked.
But as soon as the packet egress the node locally (no Primary exit node), and ingress another node because of BGP routing (and therefore had to pass through EVPN+VXLAN), it no longer worked.
If output and input occur at the same node
Code:
EGRESS ICMP REQUEST :
01:47:49.054950 veth100i0 P IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.054975 fwln100i0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.054976 fwpr100p0 P IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.054976 evpn In IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.055003 vmbr0.3 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.055005 vmbr0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
01:47:49.055012 ens10f0np0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 40637, seq 1, length 64
INGRESS REPLY :
01:47:49.055497 ens10f0np0 In IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055497 vmbr0 In IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055497 vmbr0.3 In IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055525 evpn Out IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055530 fwpr100p0 Out IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055532 fwln100i0 P IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
01:47:49.055544 veth100i0 Out IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 40637, seq 1, length 64
If, due to network routing, egress and ingress are not via the same node
Code:
Node 2 with VM, egress ICMP :
01:51:15.462128 tap105i0 P IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462140 fwln105i0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462144 fwpr105p0 P IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462147 evpn In IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462157 vmbr0.3 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462158 vmbr0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
01:51:15.462162 ens10f0np0 Out IP 213.152.X.X > 8.8.8.8: ICMP echo request, id 579, seq 1, length 64
Node 1 - ingress (due to BGP perspective) :
01:51:15.463390 ens10f0np0 In IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 579, seq 1, length 64
01:51:15.463390 vmbr0 In IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 579, seq 1, length 64
01:51:15.463390 vmbr0.3 In IP 8.8.8.8 > 213.152.X.X: ICMP echo reply, id 579, seq 1, length 64
We can see that, incomprehensibly, the package fits but stops at vmbr0.3 instead of vrfbr_evpn (the VRF bridge for EVPN+VXLAN)
However, the routing has not changed, the route is still the same and is present in the routing table.
213.152.X.X nhid 353 via 172.31.255.5 dev vrfbr_evpn proto bgp src 213.152.Y.Y metric 20 onlink
Worse still, during my troubleshooting, I noticed behavior that didn't make sense at the time.
A ping to 8.8.8.8 didn't work, but a DNS query did!
Code:
Node 2 with VM, egress DNS request :
02:04:01.597952 tap105i0 P IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597964 fwln105i0 Out IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597968 fwpr105p0 P IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597971 evpn In IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597979 vmbr0.3 Out IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597980 vmbr0 Out IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.597983 ens10f0np0 Out IP 213.152.X.X.42023 > 8.8.8.8.53: 11008+ [1au] A? google.Fr. (50)
02:04:01.603899 vrfvx_evpn In IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54) <- VXLAN from Node 1
02:04:01.603902 vrfbr_evpn In IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.603907 evpn Out IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.603909 fwpr105p0 Out IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.603911 fwln105i0 P IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.603913 tap105i0 Out IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.608743 tap105i0 P IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608752 fwln105i0 Out IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608754 fwpr105p0 P IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608756 evpn In IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608763 vmbr0.3 Out IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608764 vmbr0 Out IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.608767 ens10f0np0 Out IP 213.152.X.X.48363 > 8.8.8.8.53: 17495+ [1au] A? 8.8.8.8. (48)
02:04:01.613895 vrfvx_evpn In IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111) <- VXLAN from Node 1
02:04:01.613901 vrfbr_evpn In IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.613915 evpn Out IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.613921 fwpr105p0 Out IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.613925 fwln105i0 P IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.613933 tap105i0 Out IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
Node 1 - ingress (due to BGP perspective) :
02:04:01.604374 ens10f0np0 In IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.604374 vmbr0 In IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.604374 vmbr0.3 In IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.604407 vrfbr_evpn Out IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54)
02:04:01.604413 vrfvx_evpn Out IP 8.8.8.8.53 > 213.152.X.X.42023: 11008 1/0/1 A 142.250.179.99 (54) -> To VXLAN
02:04:01.614347 ens10f0np0 In IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.614347 vmbr0 In IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.614347 vmbr0.3 In IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.614379 vrfbr_evpn Out IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111)
02:04:01.614384 vrfvx_evpn Out IP 8.8.8.8.53 > 213.152.X.X.48363: 17495 NXDomain$ 0/1/1 (111) -> To VXLAN
I really racked my brains for a long time, then at some point I remembered that I had activated the firewall on the cluster.
So I started looking at this, and there are a lot of basic rules on the host that I don't necessarily understand.
So I decided to simply turn off the firewall on the cluster and ...see
And the magic works!
<capture node1-2 firewall.txt attachment>
In other words, as soon as I activate the cluster's firewall, probably because of a rule or the conntrack, it completely breaks my asymmetrical incoming traffic.
The firewall should trigger a rule and delete the packet, instead of transferring it to vrfbr_evpn.
Does anyone have a solution to this problem?
Johann
Attachments
Last edited: