EVPN/VXLAN between nodes on different public networks

JiLPi

New Member
Jan 23, 2024
6
0
1
Hi,

We're trying to create a cluster of Proxmox nodes directly connected to Internet (public IP, inexpensive servers provided by OVH). No LAN.
We have 2 nodes atm, one in France (n1) and one in Germany (n2).

We're inexperienced with SDN.

Our objectives are :
- put all VMs on the same VNet / Subnet on any node
- VMs should be able to reach all other VMs on the same VNet regardless of which node they are actually located.
- VMs should be able to access the Internet, and be reachable from the Internet if we set the right forwarding rules.
- Encrypted traffic between nodes

We have disabled all firewalls, at the datacenter level and node levels.
Ports 179/TCP and 4789/UDP can be reached from n1 to n2 and vice versa. Can talk with bgpd on remote nodes with netcat.


1/ VXLAN
We have already successfully established a VXLAN that span accross both nodes. VMs can all ping each other, great. VXLAN traffic between nodes is encrypted using strongswan (https://pve.proxmox.com/pve-docs/chapter-pvesdn.html#_vxlan_ipsec_encryption)

But we have failed to let them reach other networks / Internet. We could not figure out how to define a gateway and let anything exit or enter the VXLAN... Could someone point us to some helpful resources?


2/ EVPN
The possibility to manage different VXLAN sounds interesting; we also tried with EVPN.
It seems to be very straightforward in a "layer 2" environment. But we can't get it to work in our case (through Internet / public networks).

Good:
- VMs can reach other VMs on the same node
- VMs can reach the internet

Issues:
- VMs on one node cannot reach VMs on the other node
- It seems that no traffic transits between nodes
- We don't understand how bgp works in this context, and can't see anything happening (vtxsh# show bgp summary shows that the BGP connexions remains in "Active" state with no information being exchanged between nodes)


Configuration on both nodes follows.

Thanks in advance.
 
Here's the configuration on node-1 (145.239.xxx.xxx)

Bash:
dpkg -l|grep frr
ii  frr                                  8.5.2-1+pve1                        amd64        FRRouting suite of internet protocols (BGP, OSPF, IS-IS, ...)
ii  frr-pythontools                      8.5.2-1+pve1                        all          FRRouting suite - Python tools

Code:
cat /etc/frr/daemons |grep bgpd
bgpd=yes
bgpd_options="   -A 127.0.0.1"
# bgpd_wrap="/usr/bin/daemonize /usr/bin/mywrapper"

Code:
cat /etc/pve/sdn/*.cfg
evpn: evpnctr1
        asn 65432
        peers 145.239.yyy.yyy,145.239.xxx.xxx

bgp: bgpnode-1
        asn 65001
        node node-1
        peers 145.239.yyy.yyy
        bgp-multipath-as-path-relax 0
        ebgp 1
        ebgp-multihop 20

bgp: bgpnode-2
        asn 65002
        node node-2
        peers 145.239.xxx.xxx
        bgp-multipath-as-path-relax 0
        ebgp 1
        ebgp-multihop 20

subnet: znVXL01-10.0.3.0-24
        vnet vnVXL01
        gateway 10.0.3.1
        snat 1

subnet: znEVPN01-10.0.10.0-24
        vnet vnEVPN01
        gateway 10.0.10.1
        snat 1

vnet: vnVXL01
        zone znVXL01
        alias VNet VXLAN 01
        tag 200

vnet: vnEVPN01
        zone znEVPN01
        alias VNet EVPN 01
        tag 11000

vxlan: znVXL01
        peers 145.239.xxx.xxx,145.239.yyy.yyy
        ipam pve
        mtu 1370

evpn: znEVPN01
        controller evpnctr1
        vrf-vxlan 10000
        exitnodes node-2,node-1
        exitnodes-local-routing 1
        exitnodes-primary node-1
        ipam pve
        mac BC:24:11:24:54:14

Code:
cat /etc/frr/frr.conf
frr version 8.5.1
frr defaults datacenter
hostname node-1
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_znEVPN01
 vni 10000
 ip route 10.0.3.0/24 null0
exit-vrf
!
router bgp 65001
 bgp router-id 145.239.xxx.xxx
 no bgp default ipv4-unicast
 coalesce-time 1000
 neighbor BGP peer-group
 neighbor BGP remote-as external
 neighbor BGP bfd
 neighbor BGP ebgp-multihop 20
 neighbor 145.239.yyy.yyy peer-group BGP
 neighbor VTEP peer-group
 neighbor VTEP remote-as external
 neighbor VTEP bfd
 neighbor 145.239.yyy.yyy peer-group VTEP
 !
 address-family ipv4 unicast
  neighbor BGP activate
  neighbor BGP soft-reconfiguration inbound
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor VTEP route-map MAP_VTEP_IN in
  neighbor VTEP route-map MAP_VTEP_OUT out
  neighbor VTEP activate
  advertise-all-vni
  autort as 65432
 exit-address-family
exit
!
router bgp 65001 vrf vrf_znEVPN01
 bgp router-id 145.239.xxx.xxx
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 !
 address-family l2vpn evpn
  route-target import 65432:10000
  route-target export 65432:10000
  default-originate ipv4
  default-originate ipv6
 exit-address-family
exit
!
route-map MAP_VTEP_IN deny 1
 match evpn vni 10000
 match evpn route-type prefix
exit
!
route-map MAP_VTEP_IN permit 2
exit
!
route-map MAP_VTEP_OUT permit 1
exit
!
ip route 10.0.10.0/24 10.255.255.2 xvrf_znEVPN01
!
line vty

Code:
cat /etc/network/interfaces.d/sdn
#version:33

auto vnEVPN01
iface vnEVPN01
        address 10.0.10.1/24
        post-up iptables -t nat -A POSTROUTING -s '10.0.10.0/24' -o vmbr0 -j SNAT --to-source 145.239.xxx.xxx
        post-down iptables -t nat -D POSTROUTING -s '10.0.10.0/24' -o vmbr0 -j SNAT --to-source 145.239.xxx.xxx
        post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
        post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
        hwaddress BC:24:11:24:54:14
        bridge_ports vxlan_vnEVPN01
        bridge_stp off
        bridge_fd 0
        mtu 1450
        alias VNet EVPN 01
        ip-forward on
        arp-accept on
        vrf vrf_znEVPN01

auto vnVXL01
iface vnVXL01
        bridge_ports vxlan_vnVXL01
        bridge_stp off
        bridge_fd 0
        mtu 1370
        alias VNet VXLAN 01

auto vrf_znEVPN01
iface vrf_znEVPN01
        vrf-table auto
        post-up ip route del vrf vrf_znEVPN01 unreachable default metric 4278198272

auto vrfbr_znEVPN01
iface vrfbr_znEVPN01
        bridge-ports vrfvx_znEVPN01
        bridge_stp off
        bridge_fd 0
        mtu 1450
        vrf vrf_znEVPN01

auto vrfvx_znEVPN01
iface vrfvx_znEVPN01
        vxlan-id 10000
        vxlan-local-tunnelip 145.239.xxx.xxx
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

auto vxlan_vnEVPN01
iface vxlan_vnEVPN01
        vxlan-id 11000
        vxlan-local-tunnelip 145.239.xxx.xxx
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

auto vxlan_vnVXL01
iface vxlan_vnVXL01
        vxlan-id 200
        vxlan_remoteip 145.239.yyy.yyy
        mtu 1370

auto xvrf_znEVPN01
iface xvrf_znEVPN01
        link-type veth
        address 10.255.255.1/30
        veth-peer-name xvrfp_znEVPN01
        mtu 1500

auto xvrfp_znEVPN01
iface xvrfp_znEVPN01
        link-type veth
        address 10.255.255.2/30
        veth-peer-name xvrf_znEVPN01
        vrf vrf_znEVPN01
        mtu 1500


Code:
cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 145.239.xxx.xxx/24
        gateway 145.239.xxx.254
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        hwaddress A4:BF:01:2B:33:E9

iface vmbr0 inet6 static
        address 2001:41d0:XXX:XXXX::/64
        gateway 2001:41d0:XXX:XXff:ff:ff:ff:ff


source /etc/network/interfaces.d/*
 
Configuration on node-2 (145.239.xxx.xxx):

Code:
dpkg -l|grep frr
ii  frr                                  8.5.2-1+pve1                        amd64        FRRouting suite of internet protocols (BGP, OSPF, IS-IS, ...)
ii  frr-pythontools                      8.5.2-1+pve1                        all          FRRouting suite - Python tools

Code:
cat /etc/frr/daemons |grep bgpd
bgpd=yes
bgpd_options="   -A 127.0.0.1"
# bgpd_wrap="/usr/bin/daemonize /usr/bin/mywrapper"

Code:
cat /etc/pve/sdn/*.cfg
evpn: evpnctr1
        asn 65432
        peers 145.239.yyy.yyy,145.239.xxx.xxx

bgp: bgpnode-1
        asn 65001
        node node-1
        peers 145.239.yyy.yyy
        bgp-multipath-as-path-relax 0
        ebgp 1
        ebgp-multihop 20

bgp: bgpnode-2
        asn 65002
        node node-2
        peers 145.239.xxx.xxx
        bgp-multipath-as-path-relax 0
        ebgp 1
        ebgp-multihop 20

subnet: znVXL01-10.0.3.0-24
        vnet vnVXL01
        gateway 10.0.3.1
        snat 1

subnet: znEVPN01-10.0.10.0-24
        vnet vnEVPN01
        gateway 10.0.10.1
        snat 1

vnet: vnVXL01
        zone znVXL01
        alias VNet VXLAN 01
        tag 200

vnet: vnEVPN01
        zone znEVPN01
        alias VNet EVPN 01
        tag 11000

vxlan: znVXL01
        peers 145.239.xxx.xxx,145.239.yyy.yyy
        ipam pve
        mtu 1370

evpn: znEVPN01
        controller evpnctr1
        vrf-vxlan 10000
        exitnodes node-2,node-1
        exitnodes-local-routing 1
        exitnodes-primary node-1
        ipam pve
        mac BC:24:11:24:54:14

Code:
cat /etc/frr/frr.conf
frr version 8.5.1
frr defaults datacenter
hostname node-2
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_znEVPN01
 vni 10000
 ip route 10.0.3.0/24 null0
exit-vrf
!
router bgp 65002
 bgp router-id 145.239.yyy.yyy
 no bgp default ipv4-unicast
 coalesce-time 1000
 neighbor BGP peer-group
 neighbor BGP remote-as external
 neighbor BGP bfd
 neighbor BGP ebgp-multihop 20
 neighbor 145.239.xxx.xxx peer-group BGP
 neighbor VTEP peer-group
 neighbor VTEP remote-as external
 neighbor VTEP bfd
 neighbor 145.239.xxx.xxx peer-group VTEP
 !
 address-family ipv4 unicast
  neighbor BGP activate
  neighbor BGP soft-reconfiguration inbound
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor VTEP route-map MAP_VTEP_IN in
  neighbor VTEP route-map MAP_VTEP_OUT out
  neighbor VTEP activate
  advertise-all-vni
  autort as 65432
 exit-address-family
exit
!
router bgp 65002 vrf vrf_znEVPN01
 bgp router-id 145.239.yyy.yyy
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 !
 address-family l2vpn evpn
  route-target import 65432:10000
  route-target export 65432:10000
  default-originate ipv4
  default-originate ipv6
 exit-address-family
exit
!
route-map MAP_VTEP_IN permit 1
exit
!
route-map MAP_VTEP_OUT permit 1
 match evpn vni 10000
 match evpn route-type prefix
 set metric 200
exit
!
route-map MAP_VTEP_OUT permit 2
exit
!
ip route 10.0.10.0/24 10.255.255.2 xvrf_znEVPN01
!
line vty
!


Code:
cat /etc/network/interfaces.d/sdn
#version:33

auto vnEVPN01
iface vnEVPN01
        address 10.0.10.1/24
        post-up iptables -t nat -A POSTROUTING -s '10.0.10.0/24' -o vmbr0 -j SNAT --to-source 145.239.xxx.xxx
        post-down iptables -t nat -D POSTROUTING -s '10.0.10.0/24' -o vmbr0 -j SNAT --to-source 145.239.xxx.xxx
        post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
        post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
        hwaddress BC:24:11:24:54:14
        bridge_ports vxlan_vnEVPN01
        bridge_stp off
        bridge_fd 0
        mtu 1450
        alias VNet EVPN 01
        ip-forward on
        arp-accept on
        vrf vrf_znEVPN01

auto vnVXL01
iface vnVXL01
        bridge_ports vxlan_vnVXL01
        bridge_stp off
        bridge_fd 0
        mtu 1370
        alias VNet VXLAN 01

auto vrf_znEVPN01
iface vrf_znEVPN01
        vrf-table auto
        post-up ip route del vrf vrf_znEVPN01 unreachable default metric 4278198272

auto vrfbr_znEVPN01
iface vrfbr_znEVPN01
        bridge-ports vrfvx_znEVPN01
        bridge_stp off
        bridge_fd 0
        mtu 1450
        vrf vrf_znEVPN01

auto vrfvx_znEVPN01
iface vrfvx_znEVPN01
        vxlan-id 10000
        vxlan-local-tunnelip 145.239.xxx.xxx
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

auto vxlan_vnEVPN01
iface vxlan_vnEVPN01
        vxlan-id 11000
        vxlan-local-tunnelip 145.239.xxx.xxx
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

auto vxlan_vnVXL01
iface vxlan_vnVXL01
        vxlan-id 200
        vxlan_remoteip 145.239.yyy.yyy
        mtu 1370

auto xvrf_znEVPN01
iface xvrf_znEVPN01
        link-type veth
        address 10.255.255.1/30
        veth-peer-name xvrfp_znEVPN01
        mtu 1500

auto xvrfp_znEVPN01
iface xvrfp_znEVPN01
        link-type veth
        address 10.255.255.2/30
        veth-peer-name xvrf_znEVPN01
        vrf vrf_znEVPN01
        mtu 1500


Code:
cat /etc/network/interfacess
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 145.239.yyy.yyy/24
        gateway 145.239.yyy.254
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        hwaddress A4:BF:01:1F:46:3A

iface vmbr0 inet6 static
        address 2001:41d0:YYYY:YYYY::/64
        gateway 2001:41d0:YYYY:YYff:ff:ff:ff:ff


source /etc/network/interfaces.d/*
 
Other debug information

BGP (similar output on node 1 and node 2)

Code:
node-2# show bgp summary

IPv4 Unicast Summary (VRF default):
BGP router identifier 145.239.yyy.yyy, local AS number 65002 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 725 KiB of memory
Peer groups 2, using 128 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
145.239.xxx.xxx   4          0         0         0        0    0    0    never       Active        0 N/A

Total number of neighbors 1

Code:
vtysh -c "sh ip bgp l2vpn evpn"
BGP table version is 11, local router ID is 145.239.yyy.yyy
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 145.239.yyy.yyy:2
 *> [3]:[0]:[32]:[145.239.yyy.yyy]
                    145.239.yyy.yyy(node-2)
                                                       32768 i
                    ET:8 RT:65432:200
Route Distinguisher: 145.239.yyy.yyy:3
 *> [3]:[0]:[32]:[145.239.yyy.yyy]
                    145.239.yyy.yyy(node-2)
                                                       32768 i
                    ET:8 RT:65432:11000
Route Distinguisher: 145.239.yyy.yyy:4
 *> [5]:[0]:[0]:[0.0.0.0]
                    145.239.yyy.yyy(node-2)
                                                       32768 i
                    ET:8 RT:65432:10000 Rmac:8e:6d:94:6e:3e:fd
 *> [5]:[0]:[0]:[::] 145.239.yyy.yyy(node-2)
                                                       32768 i
                    ET:8 RT:65432:10000 Rmac:8e:6d:94:6e:3e:fd

Displayed 4 out of 4 total prefixes


iptables (node-1: VXLAN input, node-2 : VXLAN output)

Code:
node-2#
root@node-2:~# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-A OUTPUT -s 145.239.yyy.yyy/32 -d 145.239.xxx.xxx/32 -o vmbr0 -p udp -m policy --dir out --pol ipsec --reqid 1 --proto esp -m udp --dport 4789 -j ACCEPT
 
Hi,
begin by something simple, with same asn and without exit-node

Code:
evpn: evpnctr1
        asn 65432
        peers 145.239.yyy.yyy,145.239.xxx.xxx

evpn: znEVPN01
        controller evpnctr1
        vrf-vxlan 10000
        ipam pve
        mac BC:24:11:24:54:14

if your nodes are able to communicate on the bgp port + vxlan port, it's should be enough to evpn working in the same asn (ibgp)

you can add later extra bgp controller for each node, if you want to do ebgp (a different asn by node), or if you want to peer with classic bgp to another routers. (I'm not sure that in the ovh context, it's something usefull in your case)


you should see connection established in "show bgp summary", then you should se evpn routes "sh ip bgp l2evpn evpn".
if the vxlan port is correctly open in firewall, the vms should be able to communicate in private.
 
Last edited:
Hello,
Thanks a lot for your help.
I'm working with JiLPi, and we made it as simple as possible :

Code:
 cat /etc/pve/sdn/*
evpn: evpnctr1
        asn 65432
        peers 145.239.yyy.yyy,145.239.xxx.xxx

subnet: znEVPN01-10.0.10.0-24
        vnet vnEVPN01
        gateway 10.0.10.1
        snat 1

vnet: vnEVPN01
        zone znEVPN01
        alias VNet EVPN 01
        tag 11000

evpn: znEVPN01
        controller evpnctr1
        vrf-vxlan 10000
        ipam pve
        mac BC:24:11:24:54:14

CT 103 and 105 are on node 1
CT 104 and 106 are on node 2

We verified that we can reach the BGP port from each node :
Bash:
nc 145.239.yyy.yyy -t bgp
b       EFFAEF

Both of them show that we can establish a remote tcp connection on the bgp port with the remote node.


We then tried to ping from 2 CT in the SAME node (CT 105 to CT 103):
Bash:
ping 10.0.10.103
PING 10.0.10.103 (10.0.10.103) 56(84) bytes of data.
64 bytes from 10.0.10.103: icmp_seq=1 ttl=64 time=0.054 ms
64 bytes from 10.0.10.103: icmp_seq=2 ttl=64 time=0.045 ms

It works as expected.
We can also ping the gateway (10.0.10.1) on both nodes from their local CTs.


However, when we try to ping a CT to another node's CT (CT 105 to CT 104 for example), it doesn't work:
Bash:
ping 10.0.10.104
PING 10.0.10.104 (10.0.10.104) 56(84) bytes of data.
From 10.0.10.105 icmp_seq=1 Destination Host Unreachable
From 10.0.10.105 icmp_seq=2 Destination Host Unreachable
From 10.0.10.105 icmp_seq=3 Destination Host Unreachable


Here is also the result from the vtysh -c "show bgp summary" command
Bash:
vtysh -c "show bgp summary"

L2VPN EVPN Summary (VRF default):
BGP router identifier 145.239.yyy.yyy, local AS number 65432 vrf-id 0
BGP table version 0
RIB entries 5, using 960 bytes of memory
Peers 1, using 725 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor          V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
145.239.xxx.xxx   4      65432         0         0        0    0    0    never       Active        0 N/A

Total number of neighbors 1


Similarly to @virgil246 in this thread, we don't see anything happening on the bgp port using tcpdump (except when we open a connection ourselves with nc).
https://forum.proxmox.com/threads/p...ination-unreachable.118153/page-2#post-627507


It seems that BGP won't even try to communicate with other peers.

Could the fact that the peers are on another network have any influence here?
 
i did it slightly different
vxlan on the site of the VMs (that i can move the VMs around). the vxlan is connected on a opnsense firewall.
 
i did it slightly different
vxlan on the site of the VMs (that i can move the VMs around). the vxlan is connected on a opnsense firewall.
Our setup is totally different.
The nodes are directly connected to the internet and have a public IP.
No open sense firewall, switch, etc.
100% IP / layer 3.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!