EVPN SDN issues after Upgrade Proxmox VE from 7 to 8

ddimarx

New Member
Oct 11, 2022
2
1
3
Hello all,

TL;DR: I upgraded my Proxmox VE cluster from 7.4-16 to 8.0.3 and my SDN and EVPN setup stopped working properly. I fixed some issues by removing a BGP default route, disabling arp-nd suppression, and setting rp_filter to 0. But I still have random connectivity problems between VMs and LXCs on different nodes.

I'm running multiple 3-node clusters, installed Proxmox VE on top of a plain Debian Bullseye (without using the Proxmox VE ISO) on version 7.4-16 with a no-subscription repository and recently I've proceeded and successfully upgrade one 3-node (lab/test) PVE cluster to 8.0.3 according to this guide https://pve.proxmox.com/wiki/Upgrade_from_7_to_8

I was already using SDN with EVPN in the cluster which was working just fine prior to the upgrade and VMs and LXCs between different nodes of the cluster were able to communicate and reach/ping each other being connected on the same EVPN zone and VNet and at the same time using the Vnet Gateway with SNAT enabled were able to reach the internet without any issues

But after the upgrade to 8.0.3 all VMs and LXCs which are connected on the same EVPN zone and VNet, can not reach/ping between different nodes but only can ping between them on the current cluster node and no one of the VMs and LXCs is able to reach the internet.

I've noticed the following in the 8.0.3 routing table of each node

Code:
root@labpve2:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
default nhid 46 proto bgp metric 20
        nexthop via 10.2.0.21 dev vrfbr_evpnzone weight 1 onlink
        nexthop via 10.2.0.23 dev vrfbr_evpnzone weight 1 onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.22
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.8
10.10.10.0/24 nhid 24 dev evpnet10 proto bgp metric 20
10.10.10.4 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.12 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.19 nhid 34 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.31 nhid 34 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.100 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.31

Code:
root@labpve2:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
default nhid 46 proto bgp metric 20
        nexthop via 10.2.0.21 dev vrfbr_evpnzone weight 1 onlink
        nexthop via 10.2.0.23 dev vrfbr_evpnzone weight 1 onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.22
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.8
10.10.10.0/24 nhid 24 dev evpnet10 proto bgp metric 20
10.10.10.4 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.12 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.19 nhid 34 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.31 nhid 34 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.100 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.31

Code:
root@labpve3:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink 
default nhid 50 proto bgp metric 20 
        nexthop via 10.2.0.21 dev vrfbr_evpnzone weight 1 onlink 
        nexthop via 10.2.0.22 dev vrfbr_evpnzone weight 1 onlink 
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.23 
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.9 
10.10.10.0/24 nhid 24 dev evpnet10 proto bgp metric 20 
10.10.10.4 nhid 40 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink 
10.10.10.10 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink 
10.10.10.12 nhid 40 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink 
10.10.10.25 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink 
10.10.10.29 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink 
10.10.10.32 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink 
10.10.10.40 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink 
10.10.10.43 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink 
10.10.10.100 nhid 40 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink 
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.32


And compared with the working 7.4-16 routing table the was only one default gateway
Code:
default via 10.2.0.1 dev vmbr0 proto kernel onlink

and not a default gateway that was propagated through the bgp

So I've managed to remove this bgp default gate way using the following command on each node
vtysh -c "configure terminal" -c "router bgp 65000 vrf vrf_evpnzone" -c "address-family l2vpn evpn" -c "no default-originate ipv4" -c "no default-originate ipv6"

But every time a cluster node was restarting or the SDN ware reloading i has to re-run it , so I've managed to apply this permanently by creating a frr.conf.locak file on each node under /etc/frr

root@labpve1:/etc/frr# cp frr.conf frr.conf.local

and then edit the frr.conf.local and under "address-family l2vpn evpn" set "no" to "default-originate ipv4" and "default-originate ipv6"

Code:
root@labpve1:/etc/frr# cat  frr.conf.local
frr version 8.5.1
frr defaults datacenter
hostname labpve1
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_evpnzone
 vni 10000
exit-vrf
!
router bgp 65000
 bgp router-id 10.2.0.21
 no bgp default ipv4-unicast
 coalesce-time 1000
 neighbor VTEP peer-group
 neighbor VTEP remote-as 65000
 neighbor VTEP bfd
 neighbor 10.2.0.22 peer-group VTEP
 neighbor 10.2.0.23 peer-group VTEP
 !
 address-family ipv4 unicast
  import vrf vrf_evpnzone
 exit-address-family
 !
 address-family ipv6 unicast
  import vrf vrf_evpnzone
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor VTEP route-map MAP_VTEP_IN in
  neighbor VTEP route-map MAP_VTEP_OUT out
  neighbor VTEP activate
  advertise-all-vni
 exit-address-family
exit
!
router bgp 65000 vrf vrf_evpnzone
 bgp router-id 10.2.0.21
 !
 address-family ipv4 unicast
  redistribute connected
 exit-address-family
 !
 address-family ipv6 unicast
  redistribute connected
 exit-address-family
 !
 address-family l2vpn evpn
  no default-originate ipv4
  no default-originate ipv6
 exit-address-family
exit
!
route-map MAP_VTEP_IN deny 1
 match evpn route-type prefix
exit
!
route-map MAP_VTEP_IN permit 2
exit
!
route-map MAP_VTEP_OUT permit 1
exit
!
line vty
!

After this change, the routing table in the 8.0.3 cluster nodes looks again as on 7.4-16

Code:
root@labpve1:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.21
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.7
10.10.10.0/24 nhid 81 dev evpnet10 proto bgp metric 20
10.10.10.10 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.19 nhid 85 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.20 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.25 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.26 nhid 85 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.29 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.31 nhid 85 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.32 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.40 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.43 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.204 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.30

Code:
root@labpve2:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.22
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.8
10.10.10.0/24 nhid 24 dev evpnet10 proto bgp metric 20
10.10.10.4 nhid 32 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.12 nhid 32 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.19 nhid 33 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.26 nhid 33 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.31 nhid 33 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.100 nhid 32 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.31

Code:
root@labpve3:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.23
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.9
10.10.10.0/24 nhid 76 dev evpnet10 proto bgp metric 20
10.10.10.4 nhid 81 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.10 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.12 nhid 81 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.20 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.25 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.29 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.32 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.35 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.40 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.43 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.100 nhid 81 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.204 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.32

After this changes all the VMs and LXCs is able to reach the internet but still weren't able to reach/ping between different nodes but only can ping between them on the
current cluster node

Here is the the network configuration of node 1

Code:
root@labpve1:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enP57647s1 inet manual

iface eth0 inet manual

iface enP55469s1 inet manual

auto eth1
iface eth1 inet static
        address 10.2.3.7/24

auto vmbr0
iface vmbr0 inet static
        address 10.2.0.21/24
        gateway 10.2.0.1
        bridge-ports eth0
        bridge-stp off
        bridge-fd 0
        post-up   echo 1 > /proc/sys/net/ipv4/ip_forward
        post-up   echo 1 > /proc/sys/net/ipv4/conf/vmbr0/proxy_arp
        post-up iptables -t nat -A PREROUTING -i vmbr0 -p tcp -m multiport --dport 80,443,2222,3379,3389,5555,8007,5050 -j DNAT --to 10.10.10.10
        post-up iptables -t nat -A PREROUTING -i vmbr0 -p tcp -m multiport --dport 25,587,465,110,143,993 -j DNAT --to 10.10.10.10
        post-down iptables -t nat -D PREROUTING -i vmbr0 -p tcp -m multiport --dport 80,443,2222,3379,3389,5555,8007,5050 -j DNAT --to 10.10.10.10
        post-down iptables -t nat -D PREROUTING -i vmbr0 -p tcp -m multiport --dport 25,587,465,110,143,993 -j DNAT --to 10.10.10.10
source-directory /etc/network/interfaces.d
source-directory /run/network/interfaces.d
source /etc/network/interfaces.d/*


Here is the the EVPN/SDN configuration

Code:
root@labpve1:~# cat  /etc/network/interfaces.d/sdn
#version:229

auto evpnet10
iface evpnet10
        address 10.10.10.1/24
        post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j SNAT --to-source 10.2.0.21
        post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j SNAT --to-source 10.2.0.21
        post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
        post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
        hwaddress 32:04:D8:3A:7F:E9
        bridge_ports vxlan_evpnet10
        bridge_stp off
        bridge_fd 0
        mtu 1450
        ip-forward on
        arp-accept on
        vrf vrf_evpnzone

auto vrf_evpnzone
iface vrf_evpnzone
        vrf-table auto
        post-up ip route del vrf vrf_evpnzone unreachable default metric 4278198272

auto vrfbr_evpnzone
iface vrfbr_evpnzone
        bridge-ports vrfvx_evpnzone
        bridge_stp off
        bridge_fd 0
        mtu 1450
        vrf vrf_evpnzone

auto vrfvx_evpnzone
iface vrfvx_evpnzone
        vxlan-id 10000
        vxlan-local-tunnelip 10.2.0.21
        bridge-learning off
        mtu 1450

auto vxlan_evpnet10
iface vxlan_evpnet10
        vxlan-id 11000
        vxlan-local-tunnelip 10.2.0.21
        bridge-learning off
        mtu 1450

I've resolved this by uncheck the "Disable arp-nd suppression:" in the EPVN zone configuration

1690382308903.png

And this is the current EVPN/SDN configuration after the arp-nd change

Code:
root@labpve1:/etc/network# cat interfaces.d/sdn
#version:231

auto evpnet10
iface evpnet10
        address 10.10.10.1/24
        post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j SNAT --to-source 10.2.0.21
        post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j SNAT --to-source 10.2.0.21
        post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
        post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
        hwaddress 32:04:D8:3A:7F:E9
        bridge_ports vxlan_evpnet10
        bridge_stp off
        bridge_fd 0
        mtu 1450
        ip-forward on
        arp-accept on
        vrf vrf_evpnzone

auto vrf_evpnzone
iface vrf_evpnzone
        vrf-table auto
        post-up ip route del vrf vrf_evpnzone unreachable default metric 4278198272

auto vrfbr_evpnzone
iface vrfbr_evpnzone
        bridge-ports vrfvx_evpnzone
        bridge_stp off
        bridge_fd 0
        mtu 1450
        vrf vrf_evpnzone

auto vrfvx_evpnzone
iface vrfvx_evpnzone
        vxlan-id 10000
        vxlan-local-tunnelip 10.2.0.21
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

auto vxlan_evpnet10
iface vxlan_evpnet10
        vxlan-id 11000
        vxlan-local-tunnelip 10.2.0.21
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

Also set the "net.ipv4.conf.default.rp_filter" and "net.ipv4.conf.all.rp_filter" to "0" on all 3 nodes according to the SDN guide : https://pve.proxmox.com/pve-docs/chapter-pvesdn.html

Code:
root@labpvr1:/etc/frr# sysctl -a | grep net.ipv4.conf.default.rp_filter
net.ipv4.conf.default.rp_filter = 0
root@labpve1:/etc/frr# sysctl -a | grep net.ipv4.conf.all.rp_filter
net.ipv4.conf.all.rp_filter = 0

The odd issue that still persist is that randomly and without any obvious cause, there are still some VMs or LXCs (not all of them) from one of the cluster nodes that are losing connection/communication with some other VMs or LXCs (not all of them) on another cluster node and vice-versa:

Example
---------------
VM/LXC_A on Node1 can ping VM/LXC_B on Node2 and vice-versa
VM/LXC_A on Node1 can not ping VM/LXC_C on Node3 and vice-versa
VM/LXC_B on Node2 can ping VM/LXC_C on Node3 and vice-versa
VM/LXC_C on Node3 can ping VM/LXC_D on Node1 and vice-versa
VM/LXC_D on Node1 can ping VM/LXC_B on Node2 and vice-versa
---------------

The issue sometimes is getting resolved if i restart VM_C or just disconnect/reconnect it's network interface or if the VM/LXC_C will be migrated to another node and back to its original one.

Thank you for any suggestion in order to resolve this odd issue.
 
Im currently on holiday with limited connection , but for your first question with default originate, its done because you have defined the node as exit node. So its announce 0.0.0.0 default route.
 
A little late, but I have been plagued with those issues for weeks now (After upgrading to pve 8). Well, beta feature ^^.


So my problem is similar: Hosts in the same subnet/vxlan cannot ping other hosts on different nodes.
It sometimes works / in sometimes does not.


The reason for this is a bug in FRR 8.5.1 apparently.


Disabling optimization for the 2 route maps proxmox creates in the frr.conf / or another one.
seems to be a workaround: https://github.com/FRRouting/frr/issues/13792

Cannot validate that yet, because I applied that change a few minutes ago. Will see how this turns out...


There is already a bug for this: https://bugzilla.proxmox.com/show_bug.cgi?id=4810

Solution seems that frr shipped by the proxmox repos gets updated.
 
FYI : I've tested these provided frr*.deb files and have resolved this issue.

Also these packages have been recently officially updated :
--------------------
frr (8.5.2-1+pve1) bookworm; urgency=medium

* update upstream sources to current stable/8.5 (commit
1622c2ece2f68e034b43fb037503514c2195aba5) fixing among other things:
- critical bug evpn bug with Type-3 EVPN route
- problematic BGP session resets with corrupted tunnel encapsulation
attributes, breaking RFC 7606

-- Proxmox Support Team <support@proxmox.com> Wed, 30 Aug 2023 16:58:08 +0200

frr (8.5.1-1+pve1) bookworm; urgency=medium
--------------------
 
  • Like
Reactions: spirit

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!