Node to Node communication not working with EVPN

bilby91

New Member
Jan 8, 2024
12
0
1
Hello,

I've been reading all the troubleshooting posts about EVPN trying to figure out a solution to the problem I'm having but no luck yet.

I'm experimenting in a 2-node Proxmox cluster:

vmbr0 interfaces:

Node 1: 10.0.1.131/24
Node 2: 10.0.1.132/24

Proxmox firewall is disabled for the moment, plan to use it in the future.

I'm following this tutorial https://pve.proxmox.com/pve-docs/chapter-pvesdn.html#pvesdn_setup_example_evpn

I'm trying to get the VM in Node 1 (vnet1) to ping the VM in Node 2 (vnet2) but I'm getting `Host Unreachable`.

VNET 1 has the 10.0.5.0/24 network, gateway 10.0.5.1. VNET 2 has 10.0.6.0/24 network, gateway 10.0.6.1.


My configurations look like this:

controllers.cfg
Code:
evpn: myevpncl
        asn 65000
        peers 10.0.1.131,10.0.1.132

zones.cfg

Code:
evpn: myevpnzn
        controller myevpncl
        vrf-vxlan 10000
        exitnodes-primary pve
        ipam pve
        mac BC:24:11:2A:62:B7
        mtu 1450
        nodes pve,pve-home-2

vnets.cfg
Code:
vnet: myvnet1
        zone myevpnzn
        tag 11000

vnet: myvnet2
        zone myevpnzn
        tag 12000

subnets.cfg
Code:
evpn: myevpnzn
        controller myevpncl
        vrf-vxlan 10000
        exitnodes-primary pve
        ipam pve
        mac BC:24:11:2A:62:B7
        mtu 1450
        nodes pve,pve-home-2

Node specific configurations:

Node 1:

/etc/network/interfaces.d/sdn
Code:
#version:47

auto myvnet1
iface myvnet1
        address 10.0.5.1/24
        hwaddress BC:24:11:2A:62:B7
        bridge_ports vxlan_myvnet1
        bridge_stp off
        bridge_fd 0
        mtu 1450
        ip-forward on
        arp-accept on
        vrf vrf_myevpnzn

auto myvnet2
iface myvnet2
        address 10.0.6.1/24
        hwaddress BC:24:11:2A:62:B7
        bridge_ports vxlan_myvnet2
        bridge_stp off
        bridge_fd 0
        mtu 1450
        ip-forward on
        arp-accept on
        vrf vrf_myevpnzn

auto vrf_myevpnzn
iface vrf_myevpnzn
        vrf-table auto
        post-up ip route add vrf vrf_myevpnzn unreachable default metric 4278198272

auto vrfbr_myevpnzn
iface vrfbr_myevpnzn
        bridge-ports vrfvx_myevpnzn
        bridge_stp off
        bridge_fd 0
        mtu 1450
        vrf vrf_myevpnzn

auto vrfvx_myevpnzn
iface vrfvx_myevpnzn
        vxlan-id 10000
        vxlan-local-tunnelip 10.0.1.131
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

auto vxlan_myvnet1
iface vxlan_myvnet1
        vxlan-id 11000
        vxlan-local-tunnelip 10.0.1.131
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

auto vxlan_myvnet2
iface vxlan_myvnet2
        vxlan-id 12000
        vxlan-local-tunnelip 10.0.1.131
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

/etc/frr/frr.conf

Code:
frr version 8.5.1
frr defaults datacenter
hostname pve
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_myevpnzn
 vni 10000
exit-vrf
!
router bgp 65000
 bgp router-id 10.0.1.131
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 no bgp default ipv4-unicast
 coalesce-time 1000
 neighbor VTEP peer-group
 neighbor VTEP remote-as 65000
 neighbor VTEP bfd
 neighbor 10.0.1.132 peer-group VTEP
 !
 address-family l2vpn evpn
  neighbor VTEP route-map MAP_VTEP_IN in
  neighbor VTEP route-map MAP_VTEP_OUT out
  neighbor VTEP activate
  advertise-all-vni
 exit-address-family
exit
!
router bgp 65000 vrf vrf_myevpnzn
 bgp router-id 10.0.1.131
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
exit
!
route-map MAP_VTEP_IN permit 1
exit
!
route-map MAP_VTEP_OUT permit 1
exit
!
line vty

vtysh -c 'show bgp summary'
Code:
L2VPN EVPN Summary (VRF default):
BGP router identifier 10.0.1.131, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 7, using 1344 bytes of memory
Peers 1, using 724 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor               V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
pve-home-2(10.0.1.132) 4      65000        80        79        0    0    0 00:03:22            6        5 N/A

Total number of neighbors 1

Node 2:

/etc/network/interfaces.d/sdn
Code:
#version:47

auto myvnet1
iface myvnet1
        address 10.0.5.1/24
        hwaddress BC:24:11:2A:62:B7
        bridge_ports vxlan_myvnet1
        bridge_stp off
        bridge_fd 0
        mtu 1450
        ip-forward on
        arp-accept on
        vrf vrf_myevpnzn

auto myvnet2
iface myvnet2
        address 10.0.6.1/24
        hwaddress BC:24:11:2A:62:B7
        bridge_ports vxlan_myvnet2
        bridge_stp off
        bridge_fd 0
        mtu 1450
        ip-forward on
        arp-accept on
        vrf vrf_myevpnzn

auto vrf_myevpnzn
iface vrf_myevpnzn
        vrf-table auto
        post-up ip route add vrf vrf_myevpnzn unreachable default metric 4278198272

auto vrfbr_myevpnzn
iface vrfbr_myevpnzn
        bridge-ports vrfvx_myevpnzn
        bridge_stp off
        bridge_fd 0
        mtu 1450
        vrf vrf_myevpnzn

auto vrfvx_myevpnzn
iface vrfvx_myevpnzn
        vxlan-id 10000
        vxlan-local-tunnelip 10.0.1.132
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

auto vxlan_myvnet1
iface vxlan_myvnet1
        vxlan-id 11000
        vxlan-local-tunnelip 10.0.1.132
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

auto vxlan_myvnet2
iface vxlan_myvnet2
        vxlan-id 12000
        vxlan-local-tunnelip 10.0.1.132
        bridge-learning off
        bridge-arp-nd-suppress on
        mtu 1450

/etc/frr/frr.conf

Code:
frr version 8.5.1
frr defaults datacenter
hostname pve-home-2
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_myevpnzn
 vni 10000
exit-vrf
!
router bgp 65000
 bgp router-id 10.0.1.132
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 no bgp default ipv4-unicast
 coalesce-time 1000
 neighbor VTEP peer-group
 neighbor VTEP remote-as 65000
 neighbor VTEP bfd
 neighbor 10.0.1.131 peer-group VTEP
 !
 address-family l2vpn evpn
  neighbor VTEP route-map MAP_VTEP_IN in
  neighbor VTEP route-map MAP_VTEP_OUT out
  neighbor VTEP activate
  advertise-all-vni
 exit-address-family
exit
!
router bgp 65000 vrf vrf_myevpnzn
 bgp router-id 10.0.1.132
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
exit
!
route-map MAP_VTEP_IN permit 1
exit
!
route-map MAP_VTEP_OUT permit 1
exit
!
line vty

vtysh -c 'show bgp summary'
Code:
L2VPN EVPN Summary (VRF default):
BGP router identifier 10.0.1.132, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 7, using 1344 bytes of memory
Peers 1, using 724 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
pve(10.0.1.131) 4      65000        88        89        0    0    0 00:03:50            5        6 N/A

Total number of neighbors 1

Any guidance or suggestions would be more than welcome!
 
Hi,
everything semm to be fine


do you have done last pve upgrade ?

# dpkg -l|grep frr ?

(it should be 8.5.2 , because 8.5.1 was buggy)


what is the result of "vtysh -c 'show bgp l2vpn evpn' on both nodes ?
 
I didn't have proxmox packages correctly configured so I was running an old version. I have now migrated to the proxmox repository and I have installed the latest version in both nodes.

Code:
root@pve-home-2:~# dpkg -l|grep frr
ii  frr                                  8.5.2-1+pve1                        amd64        FRRouting suite of internet protocols (BGP, OSPF, IS-IS, ...)
ii  frr-pythontools                      8.5.2-1+pve1                        all          FRRouting suite - Python tools

I still can't get my two nodes to communicate with each other using `ping`. One thing that calls my attention is the following. If I'm actively pining from vm1 -> vm2 and I restart `frr` service, a few packages can go through but then it stops working.

Code:
ubuntu@evpn2:~$ ping 10.0.5.2
PING 10.0.5.2 (10.0.5.2) 56(84) bytes of data.
From 10.0.6.1 icmp_seq=52 Destination Host Unreachable
From 10.0.6.1 icmp_seq=53 Destination Host Unreachable
From 10.0.6.1 icmp_seq=54 Destination Host Unreachable
64 bytes from 10.0.5.2: icmp_seq=55 ttl=63 time=3024 ms
64 bytes from 10.0.5.2: icmp_seq=56 ttl=63 time=2024 ms
64 bytes from 10.0.5.2: icmp_seq=57 ttl=63 time=1000 ms
64 bytes from 10.0.5.2: icmp_seq=58 ttl=63 time=0.636 ms

Output of `vtysh -c 'show bgp l2vpn evpn'`

Node 1:

Code:
root@pve:~# vtysh -c 'show bgp l2vpn evpn'
BGP table version is 1, local router ID is 10.0.1.131
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 10.0.1.131:2
 *> [3]:[0]:[32]:[10.0.1.131]
                    10.0.1.131(pve)                    32768 i
                    ET:8 RT:65001:11000
Route Distinguisher: 10.0.1.131:3
 *> [2]:[0]:[48]:[bc:24:11:79:b6:a5]
                    10.0.1.131(pve)                    32768 i
                    ET:8 RT:65001:12000
 *> [2]:[0]:[48]:[bc:24:11:79:b6:a5]:[32]:[10.0.6.2]
                    10.0.1.131(pve)                    32768 i
                    ET:8 RT:65001:12000 RT:65001:10000 Rmac:7e:32:f0:96:9d:ce
 *> [2]:[0]:[48]:[bc:24:11:79:b6:a5]:[128]:[fe80::be24:11ff:fe79:b6a5]
                    10.0.1.131(pve)                    32768 i
                    ET:8 RT:65001:12000
 *> [3]:[0]:[32]:[10.0.1.131]
                    10.0.1.131(pve)                    32768 i
                    ET:8 RT:65001:12000
Route Distinguisher: 10.0.1.132:2
 *>i[2]:[0]:[48]:[bc:24:11:35:d5:b9]
                    10.0.1.132(pve-home-2)
                                                  100      0 i
                    RT:65001:11000 ET:8
 *>i[2]:[0]:[48]:[bc:24:11:35:d5:b9]:[32]:[10.0.5.2]
                    10.0.1.132(pve-home-2)
                                                  100      0 i
                    RT:65001:10000 RT:65001:11000 ET:8 Rmac:de:d0:33:be:aa:2a
 *>i[2]:[0]:[48]:[bc:24:11:35:d5:b9]:[32]:[10.0.6.2]
                    10.0.1.132(pve-home-2)
                                                  100      0 i
                    RT:65001:10000 RT:65001:11000 ET:8 Rmac:de:d0:33:be:aa:2a
 *>i[2]:[0]:[48]:[bc:24:11:35:d5:b9]:[128]:[fe80::be24:11ff:fe35:d5b9]
                    10.0.1.132(pve-home-2)
                                                  100      0 i
                    RT:65001:11000 ET:8
 *>i[3]:[0]:[32]:[10.0.1.132]
                    10.0.1.132(pve-home-2)
                                                  100      0 i
                    RT:65001:11000 ET:8
Route Distinguisher: 10.0.1.132:3
 *>i[3]:[0]:[32]:[10.0.1.132]
                    10.0.1.132(pve-home-2)
                                                  100      0 i
                    RT:65001:12000 ET:8

Displayed 11 out of 11 total prefixes

Node 2

Code:
root@pve-home-2:~# vtysh -c 'show bgp l2vpn evpn'
BGP table version is 1, local router ID is 10.0.1.132
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 10.0.1.131:2
 *>i[3]:[0]:[32]:[10.0.1.131]
                    10.0.1.131(pve)               100      0 i
                    RT:65001:11000 ET:8
Route Distinguisher: 10.0.1.131:3
 *>i[2]:[0]:[48]:[bc:24:11:79:b6:a5]
                    10.0.1.131(pve)               100      0 i
                    RT:65001:12000 ET:8
 *>i[2]:[0]:[48]:[bc:24:11:79:b6:a5]:[32]:[10.0.6.2]
                    10.0.1.131(pve)               100      0 i
                    RT:65001:10000 RT:65001:12000 ET:8 Rmac:7e:32:f0:96:9d:ce
 *>i[2]:[0]:[48]:[bc:24:11:79:b6:a5]:[128]:[fe80::be24:11ff:fe79:b6a5]
                    10.0.1.131(pve)               100      0 i
                    RT:65001:12000 ET:8
 *>i[3]:[0]:[32]:[10.0.1.131]
                    10.0.1.131(pve)               100      0 i
                    RT:65001:12000 ET:8
Route Distinguisher: 10.0.1.132:2
 *> [2]:[0]:[48]:[bc:24:11:35:d5:b9]
                    10.0.1.132(pve-home-2)
                                                       32768 i
                    ET:8 RT:65001:11000
 *> [2]:[0]:[48]:[bc:24:11:35:d5:b9]:[32]:[10.0.5.2]
                    10.0.1.132(pve-home-2)
                                                       32768 i
                    ET:8 RT:65001:11000 RT:65001:10000 Rmac:de:d0:33:be:aa:2a
 *> [2]:[0]:[48]:[bc:24:11:35:d5:b9]:[32]:[10.0.6.2]
                    10.0.1.132(pve-home-2)
                                                       32768 i
                    ET:8 RT:65001:11000 RT:65001:10000 Rmac:de:d0:33:be:aa:2a
 *> [2]:[0]:[48]:[bc:24:11:35:d5:b9]:[128]:[fe80::be24:11ff:fe35:d5b9]
                    10.0.1.132(pve-home-2)
                                                       32768 i
                    ET:8 RT:65001:11000
 *> [3]:[0]:[32]:[10.0.1.132]
                    10.0.1.132(pve-home-2)
                                                       32768 i
                    ET:8 RT:65001:11000
Route Distinguisher: 10.0.1.132:3
 *> [3]:[0]:[32]:[10.0.1.132]
                    10.0.1.132(pve-home-2)
                                                       32768 i
                    ET:8 RT:65001:12000

Displayed 11 out of 11 total prefixes
 
Tried with "disabled arp suppression" checked but the problem persists. Restarting frr let's a few packages go through.
 
mmm, that's really strange.
I correctly see that routes of target vm (mac && ip) in "sh bgp l2evpn evpn", so it should works.
(and I really don't known why it's working at frr restart)

what is your kernel version ?

I'll try to reproduce on my side.

(I'm running evpn in production with last packages version without problem)
 
Thanks for looking into this spirit, I appreciate it.

I will have access to the proxmox nodes tonight and can check kernel versions. I installed proxmox fairly recently (weeks ago) so I suspect I'm running pretty new stuff.

The one thing I haven't tried yet is re-creating the zones/vnet/subnets again with frr updated. When I created all those resources I was running frr 8.4.X. I'm uncertain if the version change could have an impact on the bootstrapping of the configurations that wouldn't get automatically fixed after applying SDN changes.

In the environment I'm testing, all nodes are connected to home consumer routers under the same LAN and they can ping each other just fine.
 
Thanks for looking into this spirit, I appreciate it.

I will have access to the proxmox nodes tonight and can check kernel versions. I installed proxmox fairly recently (weeks ago) so I suspect I'm running pretty new stuff.

The one thing I haven't tried yet is re-creating the zones/vnet/subnets again with frr updated. When I created all those resources I was running frr 8.4.X. I'm uncertain if the version change could have an impact on the bootstrapping of the configurations that wouldn't get automatically fixed after applying SDN changes.

In the environment I'm testing, all nodes are connected to home consumer routers under the same LAN and they can ping each other just fine.
are you sure to not have any firewall or port filtering between nodes ? you need also vxlan port open (udp/4789).

the config generation shouldn't be different, maybe a reboot of node with last packages just to be sure should be enough.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!