Hello all,
I'm running multiple 3-node clusters, installed Proxmox VE on top of a plain Debian Bullseye (without using the Proxmox VE ISO) on version 7.4-16 with a no-subscription repository and recently I've proceeded and successfully upgrade one 3-node (lab/test) PVE cluster to 8.0.3 according to this guide https://pve.proxmox.com/wiki/Upgrade_from_7_to_8
I was already using SDN with EVPN in the cluster which was working just fine prior to the upgrade and VMs and LXCs between different nodes of the cluster were able to communicate and reach/ping each other being connected on the same EVPN zone and VNet and at the same time using the Vnet Gateway with SNAT enabled were able to reach the internet without any issues
But after the upgrade to 8.0.3 all VMs and LXCs which are connected on the same EVPN zone and VNet, can not reach/ping between different nodes but only can ping between them on the current cluster node and no one of the VMs and LXCs is able to reach the internet.
I've noticed the following in the 8.0.3 routing table of each node
And compared with the working 7.4-16 routing table the was only one default gateway
and not a default gateway that was propagated through the bgp
So I've managed to remove this bgp default gate way using the following command on each node
But every time a cluster node was restarting or the SDN ware reloading i has to re-run it , so I've managed to apply this permanently by creating a frr.conf.locak file on each node under /etc/frr
and then edit the frr.conf.local and under "address-family l2vpn evpn" set "no" to "default-originate ipv4" and "default-originate ipv6"
After this change, the routing table in the 8.0.3 cluster nodes looks again as on 7.4-16
After this changes all the VMs and LXCs is able to reach the internet but still weren't able to reach/ping between different nodes but only can ping between them on the
current cluster node
Here is the the network configuration of node 1
Here is the the EVPN/SDN configuration
I've resolved this by uncheck the "Disable arp-nd suppression:" in the EPVN zone configuration
And this is the current EVPN/SDN configuration after the arp-nd change
Also set the "net.ipv4.conf.default.rp_filter" and "net.ipv4.conf.all.rp_filter" to "0" on all 3 nodes according to the SDN guide : https://pve.proxmox.com/pve-docs/chapter-pvesdn.html
The odd issue that still persist is that randomly and without any obvious cause, there are still some VMs or LXCs (not all of them) from one of the cluster nodes that are losing connection/communication with some other VMs or LXCs (not all of them) on another cluster node and vice-versa:
Example
---------------
VM/LXC_A on Node1 can ping VM/LXC_B on Node2 and vice-versa
VM/LXC_A on Node1 can not ping VM/LXC_C on Node3 and vice-versa
VM/LXC_B on Node2 can ping VM/LXC_C on Node3 and vice-versa
VM/LXC_C on Node3 can ping VM/LXC_D on Node1 and vice-versa
VM/LXC_D on Node1 can ping VM/LXC_B on Node2 and vice-versa
---------------
The issue sometimes is getting resolved if i restart VM_C or just disconnect/reconnect it's network interface or if the VM/LXC_C will be migrated to another node and back to its original one.
Thank you for any suggestion in order to resolve this odd issue.
TL;DR: I upgraded my Proxmox VE cluster from 7.4-16 to 8.0.3 and my SDN and EVPN setup stopped working properly. I fixed some issues by removing a BGP default route, disabling arp-nd suppression, and setting rp_filter to 0. But I still have random connectivity problems between VMs and LXCs on different nodes.
I'm running multiple 3-node clusters, installed Proxmox VE on top of a plain Debian Bullseye (without using the Proxmox VE ISO) on version 7.4-16 with a no-subscription repository and recently I've proceeded and successfully upgrade one 3-node (lab/test) PVE cluster to 8.0.3 according to this guide https://pve.proxmox.com/wiki/Upgrade_from_7_to_8
I was already using SDN with EVPN in the cluster which was working just fine prior to the upgrade and VMs and LXCs between different nodes of the cluster were able to communicate and reach/ping each other being connected on the same EVPN zone and VNet and at the same time using the Vnet Gateway with SNAT enabled were able to reach the internet without any issues
But after the upgrade to 8.0.3 all VMs and LXCs which are connected on the same EVPN zone and VNet, can not reach/ping between different nodes but only can ping between them on the current cluster node and no one of the VMs and LXCs is able to reach the internet.
I've noticed the following in the 8.0.3 routing table of each node
Code:
root@labpve2:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
default nhid 46 proto bgp metric 20
nexthop via 10.2.0.21 dev vrfbr_evpnzone weight 1 onlink
nexthop via 10.2.0.23 dev vrfbr_evpnzone weight 1 onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.22
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.8
10.10.10.0/24 nhid 24 dev evpnet10 proto bgp metric 20
10.10.10.4 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.12 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.19 nhid 34 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.31 nhid 34 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.100 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.31
Code:
root@labpve2:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
default nhid 46 proto bgp metric 20
nexthop via 10.2.0.21 dev vrfbr_evpnzone weight 1 onlink
nexthop via 10.2.0.23 dev vrfbr_evpnzone weight 1 onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.22
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.8
10.10.10.0/24 nhid 24 dev evpnet10 proto bgp metric 20
10.10.10.4 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.12 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.19 nhid 34 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.31 nhid 34 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.100 nhid 33 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.31
Code:
root@labpve3:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
default nhid 50 proto bgp metric 20
nexthop via 10.2.0.21 dev vrfbr_evpnzone weight 1 onlink
nexthop via 10.2.0.22 dev vrfbr_evpnzone weight 1 onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.23
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.9
10.10.10.0/24 nhid 24 dev evpnet10 proto bgp metric 20
10.10.10.4 nhid 40 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.10 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.12 nhid 40 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.25 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.29 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.32 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.40 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.43 nhid 41 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.100 nhid 40 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.32
And compared with the working 7.4-16 routing table the was only one default gateway
Code:
default via 10.2.0.1 dev vmbr0 proto kernel onlink
and not a default gateway that was propagated through the bgp
So I've managed to remove this bgp default gate way using the following command on each node
vtysh -c "configure terminal" -c "router bgp 65000 vrf vrf_evpnzone" -c "address-family l2vpn evpn" -c "no default-originate ipv4" -c "no default-originate ipv6"
But every time a cluster node was restarting or the SDN ware reloading i has to re-run it , so I've managed to apply this permanently by creating a frr.conf.locak file on each node under /etc/frr
root@labpve1:/etc/frr# cp frr.conf frr.conf.local
and then edit the frr.conf.local and under "address-family l2vpn evpn" set "no" to "default-originate ipv4" and "default-originate ipv6"
Code:
root@labpve1:/etc/frr# cat frr.conf.local
frr version 8.5.1
frr defaults datacenter
hostname labpve1
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_evpnzone
vni 10000
exit-vrf
!
router bgp 65000
bgp router-id 10.2.0.21
no bgp default ipv4-unicast
coalesce-time 1000
neighbor VTEP peer-group
neighbor VTEP remote-as 65000
neighbor VTEP bfd
neighbor 10.2.0.22 peer-group VTEP
neighbor 10.2.0.23 peer-group VTEP
!
address-family ipv4 unicast
import vrf vrf_evpnzone
exit-address-family
!
address-family ipv6 unicast
import vrf vrf_evpnzone
exit-address-family
!
address-family l2vpn evpn
neighbor VTEP route-map MAP_VTEP_IN in
neighbor VTEP route-map MAP_VTEP_OUT out
neighbor VTEP activate
advertise-all-vni
exit-address-family
exit
!
router bgp 65000 vrf vrf_evpnzone
bgp router-id 10.2.0.21
!
address-family ipv4 unicast
redistribute connected
exit-address-family
!
address-family ipv6 unicast
redistribute connected
exit-address-family
!
address-family l2vpn evpn
no default-originate ipv4
no default-originate ipv6
exit-address-family
exit
!
route-map MAP_VTEP_IN deny 1
match evpn route-type prefix
exit
!
route-map MAP_VTEP_IN permit 2
exit
!
route-map MAP_VTEP_OUT permit 1
exit
!
line vty
!
After this change, the routing table in the 8.0.3 cluster nodes looks again as on 7.4-16
Code:
root@labpve1:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.21
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.7
10.10.10.0/24 nhid 81 dev evpnet10 proto bgp metric 20
10.10.10.10 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.19 nhid 85 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.20 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.25 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.26 nhid 85 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.29 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.31 nhid 85 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.32 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.40 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.43 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.204 nhid 84 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.30
Code:
root@labpve2:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.22
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.8
10.10.10.0/24 nhid 24 dev evpnet10 proto bgp metric 20
10.10.10.4 nhid 32 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.12 nhid 32 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.19 nhid 33 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.26 nhid 33 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.31 nhid 33 via 10.2.0.23 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.100 nhid 32 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.31
Code:
root@labpve3:/etc/frr# ip ro sh
default via 10.2.0.1 dev vmbr0 proto kernel onlink
10.2.0.0/24 dev vmbr0 proto kernel scope link src 10.2.0.23
10.2.3.0/24 dev eth1 proto kernel scope link src 10.2.3.9
10.10.10.0/24 nhid 76 dev evpnet10 proto bgp metric 20
10.10.10.4 nhid 81 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.10 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.12 nhid 81 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.20 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.25 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.29 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.32 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.35 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.40 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.43 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.100 nhid 81 via 10.2.0.21 dev vrfbr_evpnzone proto bgp metric 20 onlink
10.10.10.204 nhid 80 via 10.2.0.22 dev vrfbr_evpnzone proto bgp metric 20 onlink
192.168.200.0/24 dev zt2k2mncp5 proto kernel scope link src 192.168.200.32
After this changes all the VMs and LXCs is able to reach the internet but still weren't able to reach/ping between different nodes but only can ping between them on the
current cluster node
Here is the the network configuration of node 1
Code:
root@labpve1:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!
auto lo
iface lo inet loopback
iface enP57647s1 inet manual
iface eth0 inet manual
iface enP55469s1 inet manual
auto eth1
iface eth1 inet static
address 10.2.3.7/24
auto vmbr0
iface vmbr0 inet static
address 10.2.0.21/24
gateway 10.2.0.1
bridge-ports eth0
bridge-stp off
bridge-fd 0
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up echo 1 > /proc/sys/net/ipv4/conf/vmbr0/proxy_arp
post-up iptables -t nat -A PREROUTING -i vmbr0 -p tcp -m multiport --dport 80,443,2222,3379,3389,5555,8007,5050 -j DNAT --to 10.10.10.10
post-up iptables -t nat -A PREROUTING -i vmbr0 -p tcp -m multiport --dport 25,587,465,110,143,993 -j DNAT --to 10.10.10.10
post-down iptables -t nat -D PREROUTING -i vmbr0 -p tcp -m multiport --dport 80,443,2222,3379,3389,5555,8007,5050 -j DNAT --to 10.10.10.10
post-down iptables -t nat -D PREROUTING -i vmbr0 -p tcp -m multiport --dport 25,587,465,110,143,993 -j DNAT --to 10.10.10.10
source-directory /etc/network/interfaces.d
source-directory /run/network/interfaces.d
source /etc/network/interfaces.d/*
Here is the the EVPN/SDN configuration
Code:
root@labpve1:~# cat /etc/network/interfaces.d/sdn
#version:229
auto evpnet10
iface evpnet10
address 10.10.10.1/24
post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j SNAT --to-source 10.2.0.21
post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j SNAT --to-source 10.2.0.21
post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
hwaddress 32:04:D8:3A:7F:E9
bridge_ports vxlan_evpnet10
bridge_stp off
bridge_fd 0
mtu 1450
ip-forward on
arp-accept on
vrf vrf_evpnzone
auto vrf_evpnzone
iface vrf_evpnzone
vrf-table auto
post-up ip route del vrf vrf_evpnzone unreachable default metric 4278198272
auto vrfbr_evpnzone
iface vrfbr_evpnzone
bridge-ports vrfvx_evpnzone
bridge_stp off
bridge_fd 0
mtu 1450
vrf vrf_evpnzone
auto vrfvx_evpnzone
iface vrfvx_evpnzone
vxlan-id 10000
vxlan-local-tunnelip 10.2.0.21
bridge-learning off
mtu 1450
auto vxlan_evpnet10
iface vxlan_evpnet10
vxlan-id 11000
vxlan-local-tunnelip 10.2.0.21
bridge-learning off
mtu 1450
I've resolved this by uncheck the "Disable arp-nd suppression:" in the EPVN zone configuration
And this is the current EVPN/SDN configuration after the arp-nd change
Code:
root@labpve1:/etc/network# cat interfaces.d/sdn
#version:231
auto evpnet10
iface evpnet10
address 10.10.10.1/24
post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j SNAT --to-source 10.2.0.21
post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j SNAT --to-source 10.2.0.21
post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
hwaddress 32:04:D8:3A:7F:E9
bridge_ports vxlan_evpnet10
bridge_stp off
bridge_fd 0
mtu 1450
ip-forward on
arp-accept on
vrf vrf_evpnzone
auto vrf_evpnzone
iface vrf_evpnzone
vrf-table auto
post-up ip route del vrf vrf_evpnzone unreachable default metric 4278198272
auto vrfbr_evpnzone
iface vrfbr_evpnzone
bridge-ports vrfvx_evpnzone
bridge_stp off
bridge_fd 0
mtu 1450
vrf vrf_evpnzone
auto vrfvx_evpnzone
iface vrfvx_evpnzone
vxlan-id 10000
vxlan-local-tunnelip 10.2.0.21
bridge-learning off
bridge-arp-nd-suppress on
mtu 1450
auto vxlan_evpnet10
iface vxlan_evpnet10
vxlan-id 11000
vxlan-local-tunnelip 10.2.0.21
bridge-learning off
bridge-arp-nd-suppress on
mtu 1450
Also set the "net.ipv4.conf.default.rp_filter" and "net.ipv4.conf.all.rp_filter" to "0" on all 3 nodes according to the SDN guide : https://pve.proxmox.com/pve-docs/chapter-pvesdn.html
Code:
root@labpvr1:/etc/frr# sysctl -a | grep net.ipv4.conf.default.rp_filter
net.ipv4.conf.default.rp_filter = 0
root@labpve1:/etc/frr# sysctl -a | grep net.ipv4.conf.all.rp_filter
net.ipv4.conf.all.rp_filter = 0
The odd issue that still persist is that randomly and without any obvious cause, there are still some VMs or LXCs (not all of them) from one of the cluster nodes that are losing connection/communication with some other VMs or LXCs (not all of them) on another cluster node and vice-versa:
Example
---------------
VM/LXC_A on Node1 can ping VM/LXC_B on Node2 and vice-versa
VM/LXC_A on Node1 can not ping VM/LXC_C on Node3 and vice-versa
VM/LXC_B on Node2 can ping VM/LXC_C on Node3 and vice-versa
VM/LXC_C on Node3 can ping VM/LXC_D on Node1 and vice-versa
VM/LXC_D on Node1 can ping VM/LXC_B on Node2 and vice-versa
---------------
The issue sometimes is getting resolved if i restart VM_C or just disconnect/reconnect it's network interface or if the VM/LXC_C will be migrated to another node and back to its original one.
Thank you for any suggestion in order to resolve this odd issue.