SDN (EVPN) SNAT with two exit nodes not working

nikordev

Member
Mar 7, 2022
9
1
8
33
Hi all!

I have 4 nodes in a cluster

Code:
node1 (WAN: 1.1.1.2)
node2 (WAN: 1.1.1.3) --> exit
node3 (WAN: 1.1.1.4)
node4 (WAN: 1.1.1.5) --> exit

All nodes have WAN interface eth0

I have VM1 (10.10.10.2/30) on node4 and i want SNAT to eth0 on exit nodes

For node2: iptables -t nat -A POSTROUTING -s 10.10.10.0/30 -o eth0 -j SNAT --to-source 1.1.1.3
For node4: iptables -t nat -A POSTROUTING -s 10.10.10.0/30 -o eth0 -j SNAT --to-source 1.1.1.5

This does not work(

VM1 does not see the world

But with 1 exit node everything works fine:

Code:
node1 (WAN: 1.1.1.2)
node2 (WAN: 1.1.1.3) --> exit
node3 (WAN: 1.1.1.4)
node4 (WAN: 1.1.1.5)
OR
Code:
node1 (WAN: 1.1.1.2)
node2 (WAN: 1.1.1.3)
node3 (WAN: 1.1.1.4)
node4 (WAN: 1.1.1.5) --> exit

VM1 with 1 exit node sees the world
 
Hi,
I have the exact same problem using the EVPN controller.
The NAT only work when on exit-node is selected.

My setup:
3 PVE 7.1-10
Route Reflector : VYOS 1.4 rolling release
 
Hi,
I have added a new option in last sdn package (libpve-network-perl 0.7.0), to make active-passive exit-node.
This is needed for nat, as packet return need to come back to same node.

I think it's not yet avaialble in the gui.

you can add in /etc/pve/sdn/zones.cfg :

exitnodes-primary <yourprimarynode>

ex:
Code:
evpn: myzon
    controller evpnctl
    vrf-vxlan 10000
    advertise-subnets 1
    exitnodes node1,node2
    exitnodes-primary node1
    mtu 1500
 
Last edited:
Hi, I tried it and it doesn't work for me(

Code:
evpn: myzon
    controller evpnctl
    vrf-vxlan 10000
    advertise-subnets 1
    exitnodes node1,node2
    exitnodes-primary node1
    mtu 1500

After using two nodes in exitnodes and one exit node in exitnodes-primary , VM/CT does not see the world

P.S.

There is no exitnodes-primary in GUI 7.1-11 but I can see exitnodes-local-routing
I tried exitnodes-local-routing 1 with exit nodes and it didn't work to (

Code:
evpn: myzon
    controller evpnctl
    vrf-vxlan 10000
    advertise-subnets 1
    exitnodes node1,node2
    exitnodes-local-routing 1
    mtu 1500
 
Hi, I tried it and it doesn't work for me(

Code:
evpn: myzon
    controller evpnctl
    vrf-vxlan 10000
    advertise-subnets 1
    exitnodes node1,node2
    exitnodes-primary node1
    mtu 1500

After using two nodes in exitnodes and one exit node in exitnodes-primary , VM/CT does not see the world

P.S.

There is no exitnodes-primary in GUI 7.1-11 but I can see exitnodes-local-routing
I tried exitnodes-local-routing 1 with exit nodes and it didn't work to (

Code:
evpn: myzon
    controller evpnctl
    vrf-vxlan 10000
    advertise-subnets 1
    exitnodes node1,node2
    exitnodes-local-routing 1
    mtu 1500
exitnodes-local-routing is not related. (it's to be able to join vm from the exit-node itself), so remove it.


I'll check the gui, maybe some parts are still missing.

could you send my the output of:

# vtysh -c "sh bgp l2vpn evpn"

(mainly, the part with
Code:
Route Distinguisher: x.x.x.x
*> [5]:[0]:[0]:[0.0.0.0]
                  ip ....

you see the 2 primary nodes ip, with a "Weight" value differents both both.
(This should force the traffic to the primary node)
 
Hi,
I have found a bug with multiple exit-nodes,

can you test this fixed package on exit-nodes ?

Code:
wget https://mutulin1.odiso.net/libpve-network-perl_0.7.0_all.deb
dpkg -i libpve-network-perl_0.7.0_all.deb

and regenerated sdn configuration with apply button.
 
@spirit
On my system this fix works perfectly.

But, if i use exitnodes-local-routing some things are broken:
  • I don't reach the ct on the other peer from the local proxmox server as source. Local containers are reachable.
  • Masquerading for external traffic are working, but the responses are not reach the ct

Code:
evpn: hv-evp
    asn 4287755283
    peers 10.0.1.10,10.0.1.20

subnet: evpnzone-10.0.70.0-24
    vnet vnet70
    gateway 10.0.70.1

vnet: vnet70
    zone evpnzone
    tag 17000

evpn: evpnzone
    controller hv-evp
    vrf-vxlan 10000
    advertise-subnets 1
    exitnodes hv01,hv02
    exitnodes-primary hv01
    ipam pve
    mac B2:xxx
    mtu 1370

iptables -t nat -A POSTROUTING -s '10.0.0.0/16' -o enp195s0 -j MASQUERADE

Some other helpfull output:
ping from ct with ip 10.0.70.20 on hv02 to a google ip
Code:
root@hv02 ~ # tcpdump -nni any host 10.0.70.20 and icmp
veth123i0 P   IP 10.0.70.20 > 172.217.18.99: ICMP echo request, id 64342, seq 39, length 64
fwln123i0 Out IP 10.0.70.20 > 172.217.18.99: ICMP echo request, id 64342, seq 39, length 64
fwpr123p0 P   IP 10.0.70.20 > 172.217.18.99: ICMP echo request, id 64342, seq 39, length 64
vnet70 In  IP 10.0.70.20 > 172.217.18.99: ICMP echo request, id 64342, seq 39, length 64
vrfbr_evpnzone Out IP 10.0.70.20 > 172.217.18.99: ICMP echo request, id 64342, seq 39, length 64
vrfvx_evpnzone Out IP 10.0.70.20 > 172.217.18.99: ICMP echo request, id 64342, seq 39, length 64
Code:
root@hv01 ~ # tcpdump -nni any host 10.0.70.20 and icmp
vrfvx_evpnzone P   IP 10.0.70.20 > 172.217.18.99: ICMP echo request, id 3514, seq 39, length 64
vrfbr_evpnzone In  IP 10.0.70.20 > 172.217.18.99: ICMP echo request, id 3514, seq 39, length 64
xvrf_evpnzone Out IP 172.217.18.99 > 10.0.70.20: ICMP echo reply, id 3514, seq 39, length 64
xvrfp_evpnzone In  IP 172.217.18.99 > 10.0.70.20: ICMP echo reply, id 3514, seq 39, length 64
Code:
root@hv01 ~ # nft list ruleset
table ip filter {
    chain trace_chain {
        type filter hook prerouting priority raw - 1; policy accept;
        ip protocol icmp ip saddr 172.217.18.99 meta nftrace set 1
    }
}
root@hv01 ~ #  nft monitor trace
trace id 3ab382d4 ip filter trace_chain packet: iif "enp195s0" ether saddr ec:xx:ac ether daddr f0:xx:a8 ip saddr 172.217.18.99 ip daddr xx.xx.xx.xx ip dscp cs0 ip ecn not-ect ip ttl 60 ip id 0 ip length 84 icmp code net-unreachable icmp id 24445 icmp sequence 20 @th,64,96 44692464034559792918019769344
trace id 3ab382d4 ip filter trace_chain rule ip protocol icmp ip saddr 172.217.18.99 meta nftrace set 1 (verdict continue)
trace id 3ab382d4 ip filter trace_chain verdict continue
trace id 3ab382d4 ip filter trace_chain policy accept
trace id 21a5e786 ip filter trace_chain packet: iif "xvrfp_evpnzone" ether saddr 22:xx:0a ether daddr 8e:xx:8c ip saddr 172.217.18.99 ip daddr 10.0.70.20 ip dscp cs0 ip ecn not-ect ip ttl 59 ip id 0 ip length 84 icmp code net-unreachable icmp id 24445 icmp sequence 20 @th,64,96 44692464034559792918019769344
trace id 21a5e786 ip filter trace_chain rule ip protocol icmp ip saddr 172.217.18.99 meta nftrace set 1 (verdict continue)
trace id 21a5e786 ip filter trace_chain verdict continue
trace id 21a5e786 ip filter trace_chain policy accept
trace id 35505186 ip filter trace_chain packet: iif "vrf_evpnzone" ether saddr 22:xx:0a ether daddr 8e:xx:8c ip saddr 172.217.18.99 ip daddr 10.0.70.20 ip dscp cs0 ip ecn not-ect ip ttl 59 ip id 0 ip length 84 icmp code net-unreachable icmp id 24445 icmp sequence 20 @th,64,96 44692464034559792918019769344
trace id 35505186 ip filter trace_chain rule ip protocol icmp ip saddr 172.217.18.99 meta nftrace set 1 (verdict continue)
trace id 35505186 ip filter trace_chain verdict continue
trace id 35505186 ip filter trace_chain policy accept
Code:
root@hv01 ~ # ip r
default via xx.xx.xx.xx dev enp195s0 proto kernel onlink
10.0.70.0/24 nhid 42 via 10.255.255.2 dev xvrf_evpnzone proto static metric 20
10.255.255.0/30 dev xvrf_evpnzone proto kernel scope link src 10.255.255.1

root@hv01 ~ # ip r sh table 1001
broadcast 10.0.70.0 dev vnet70 proto kernel scope link src 10.0.70.1
10.0.70.0/24 dev vnet70 proto kernel scope link src 10.0.70.1
local 10.0.70.1 dev vnet70 proto kernel scope host src 10.0.70.1
10.0.70.20 nhid 44 via 10.0.1.20 dev vrfbr_evpnzone proto bgp metric 20 onlink
broadcast 10.0.70.255 dev vnet70 proto kernel scope link src 10.0.70.1
broadcast 10.255.255.0 dev xvrfp_evpnzone proto kernel scope link src 10.255.255.2
10.255.255.0/30 dev xvrfp_evpnzone proto kernel scope link src 10.255.255.2
local 10.255.255.2 dev xvrfp_evpnzone proto kernel scope host src 10.255.255.2
broadcast 10.255.255.3 dev xvrfp_evpnzone proto kernel scope link src 10.255.255.2

Footnote: If you are disable SNAT in the subnet, the SNAT rule will not be removed. I need to drop it manually with
iptables -t nat -D POSTROUTING -s 10.0.70.0/24 -o enp195s0 -j SNAT --to-source xx.xx.xx.xx
 
Last edited:
I think this bug is still in the Proxmox, testing on Proxmox VE 8.0.3 and I have the same issue with any amount of exit nodes but 1. Setting a primary exit node seems to do nothing
 
I think this bug is still in the Proxmox, testing on Proxmox VE 8.0.3 and I have the same issue with any amount of exit nodes but 1. Setting a primary exit node seems to do nothing
mmm, that's strange. primary exit-node is really the only way to get it work (as packet need to comeback to same exit-node).
it should be fixed since a long time. (and I didn't have any bug report about it)


can you send result of
"vtysh -c "sh bgp l2vpn evpn" with or without exit-node ?
 
working1exit: This is with the SDN working, only one exit node no primary
notworking2exit1primary: This is with 2 exit nodes, 1 primary. All VMs lose network connection.
 

Attachments

  • notworking2exit1primary.txt
    7.6 KB · Views: 5
  • working1exit.txt
    7.4 KB · Views: 5
Last edited:
I'm also having some problems with the exit nodes, think I can only get it to work with one exit node? I'm using BGP peering from upstream VyOS router, using ebgp multipath and equal cost for the learned vnets via each proxmox node. (multipath relax) I'm not using any nat, fully routed networks.
 
Last edited:
I'm also having some problems with the exit nodes, think I can only get it to work with one exit node? I'm using BGP peering from upstream VyOS router, using ebgp multipath and equal cost for the learned vnets via each proxmox node. (multipath relax) I'm not using any nat, fully routed networks.
This also happened to me on another deployment, still can only use one exit node
 
I'm also having some problems with the exit nodes, think I can only get it to work with one exit node? I'm using BGP peering from upstream VyOS router, using ebgp multipath and equal cost for the learned vnets via each proxmox node. (multipath relax) I'm not using any nat, fully routed networks.
I don't see any potential problem without snat.

Maybe, do you have looked if rp_filter is disabled ?

sysctl -w net.ipv4.conf.default.rp_filter=0
sysctl -w net.ipv4.conf.all.rp_filter=0

if not, it'll drop asymetric routing.
 
I'll check again but pretty sure I set those RP sysctl params early on.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!