[SOLVED] Simple EVPN does not allow ping between two VMs

RLL

New Member
Feb 4, 2025
5
0
1
Spain
www.milethos.com
I have a cluster of 3 nodes but only 2 participate in a simple EVPN. I want at least one VM to ping another VM on the other proxmox node. I don't even use SNAT or anything additional.

I am on Proxmox 8.4 and the FRR package in its latest version.

Continuous error messages appear in journalctl:


Code:
Feb 11 11:45:12 proxmox2 bgpd[2055883]: [TXY0T-CYY6F][EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Feb 11 11:45:22 proxmox2 bgpd[2055883]: [TXY0T-CYY6F][EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Feb 11 11:45:22 proxmox2 bgpd[2055883]: [QYZDQ-4PHG5][EC 100663316] Attempting to process an I/O event but for fd: 27(8) no thread to handle this!
Feb 11 11:45:32 proxmox2 bgpd[2055883]: [TXY0T-CYY6F][EC 100663299] Can't get remote address and port: Transport endpoint is not connected


Code:
Feb 11 11:54:02 proxmox1 bgpd[2090459]: [H4B4J-DCW2R][EC 33554455] 192.168.8.14 [Error] bgp_read_packet error: Connection reset by peer
Feb 11 11:54:12 proxmox1 bgpd[2090459]: [TXY0T-CYY6F][EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Feb 11 11:54:12 proxmox1 bgpd[2090459]: [H4B4J-DCW2R][EC 33554455] 192.168.8.14 [Error] bgp_read_packet error: Connection reset by peer
Feb 11 11:54:22 proxmox1 bgpd[2090459]: [TXY0T-CYY6F][EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Feb 11 11:54:22 proxmox1 bgpd[2090459]: [H4B4J-DCW2R][EC 33554455] 192.168.8.14 [Error] bgp_read_packet error: Connection reset by peer
Feb 11 11:54:32 proxmox1 bgpd[2090459]: [TXY0T-CYY6F][EC 100663299] Can't get remote address and port: Transport endpoint is not connected
 
192.168.8.14 [Error] bgp_read_packet error: Connection reset by peer

That usually indicates that there is a mismatch in the capabilities/... advertised in the BGP HELLO message.


Could you post the output of:
Code:
vtysh -c 'show bgp neighbor 192.168.8.14'

On proxmox1.
 
Too fast with sending the post - sorry.

Additionally, could you post your SDN config:

Code:
cat /etc/pve/sdn/controllers.cfg
cat /etc/pve/sdn/zones.cfg

Plus the generated FRR config on proxmox1 / proxmox2:

Code:
cat /etc/frr/frr.conf
 
That usually indicates that there is a mismatch in the capabilities/... advertised in the BGP HELLO message.


Could you post the output of:
Code:
vtysh -c 'show bgp neighbor 192.168.8.14'

On proxmox1.
Code:
root@proxmox1:/etc/frr# vtysh -c 'show bgp neighbor 192.168.8.14'
BGP neighbor is 192.168.8.14, remote AS 65000, local AS 65000, internal link
  Local Role: undefined
  Remote Role: undefined
 Member of peer-group VTEP for session parameters
  BGP version 4, remote router ID 0.0.0.0, local router ID 192.168.8.9
  BGP state = Active
  Last read 00:05:08, Last write 00:00:04
  Hold time is 9 seconds, keepalive interval is 3 seconds
  Configured hold time is 9 seconds, keepalive interval is 3 seconds
  Configured tcp-mss is 0, synced tcp-mss is 0
  Configured conditional advertisements interval is 60 seconds
  Graceful restart information:
    Local GR Mode: Helper*
    Remote GR Mode: NotApplicable
    R bit: False
    N bit: False
    Timers:
      Configured Restart Time(sec): 120
      Received Restart Time(sec): 0
      Configured LLGR Stale Path Time(sec): 0
  Message statistics:
    Inq depth is 0
    Outq depth is 0
                         Sent       Rcvd
    Opens:                 31          0
    Notifications:          0          0
    Updates:                0          0
    Keepalives:             0          0
    Route Refresh:          0          0
    Capability:             0          0
    Total:                 31          0
  Minimum time between advertisement runs is 0 seconds

 For address family: L2VPN EVPN
  VTEP peer-group member
  Not part of any update group
  NEXT_HOP is propagated unchanged to this neighbor
  Community attribute sent to this neighbor(all)
  advertise-all-vni
  Inbound path policy configured
  Outbound path policy configured
  Route map for incoming advertisements is *MAP_VTEP_IN
  Route map for outgoing advertisements is *MAP_VTEP_OUT
  0 accepted prefixes

  Connections established 0; dropped 0
  Last reset 00:05:08,  Waiting for peer OPEN (n/a)
  Internal BGP neighbor may be up to 255 hops away.
Local host: 192.168.8.10, Local port: 58960
Foreign host: 192.168.8.14, Foreign port: 179
Nexthop: 192.168.8.10
Nexthop global: fe80::52eb:f6ff:fe52:1602
Nexthop local: fe80::52eb:f6ff:fe52:1602
BGP connection: shared network
BGP Connect Retry Timer in Seconds: 10
Next connect timer due in 5 seconds
Read thread: off  Write thread: off  FD used: -1

  BFD: Type: single hop
  Detect Multiplier: 3, Min Rx interval: 300, Min Tx interval: 300
  Status: Down, Last update: 0:00:05:04
 
Too fast with sending the post - sorry.

Additionally, could you post your SDN config:

Code:
cat /etc/pve/sdn/controllers.cfg
cat /etc/pve/sdn/zones.cfg

Plus the generated FRR config on proxmox1 / proxmox2:

Code:
cat /etc/frr/frr.conf

Code:
root@proxmox2:/etc/frr# cat frr.conf
frr version 8.5.2
frr defaults datacenter
hostname proxmox2
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_evpn1
 vni 10000
exit-vrf
!
router bgp 65000
 bgp router-id 192.168.8.14
 no bgp hard-administrative-reset
 no bgp default ipv4-unicast
 coalesce-time 1000
 no bgp graceful-restart notification
 neighbor VTEP peer-group
 neighbor VTEP remote-as 65000
 neighbor VTEP bfd
 neighbor 192.168.8.9 peer-group VTEP
 !
 address-family l2vpn evpn
  neighbor VTEP activate
  neighbor VTEP route-map MAP_VTEP_IN in
  neighbor VTEP route-map MAP_VTEP_OUT out
  advertise-all-vni
 exit-address-family
exit
!
router bgp 65000 vrf vrf_evpn1
 bgp router-id 192.168.8.14
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
exit
!
route-map MAP_VTEP_IN permit 1
exit
!
route-map MAP_VTEP_OUT permit 1
exit
!
line vty
!

ip nht resolve-via-default

Code:
root@proxmox1:/etc/frr# cat /etc/pve/sdn/controllers.cfg
evpn: ctrl1
    asn 65000
    peers 192.168.8.9,192.168.8.14

root@proxmox1:/etc/frr# cat /etc/pve/sdn/zones.cfg
evpn: evpn1
    controller ctrl1
    vrf-vxlan 10000
    ipam pve
    mac BC:24:11:36:54:AB
    mtu 1450
    nodes proxmox1,proxmox2

root@proxmox1:/etc/frr# cat /etc/frr/frr.conf
frr version 8.5.2
frr defaults datacenter
hostname proxmox1
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_evpn1
 vni 10000
exit-vrf
!
router bgp 65000
 bgp router-id 192.168.8.9
 no bgp hard-administrative-reset
 no bgp default ipv4-unicast
 coalesce-time 1000
 no bgp graceful-restart notification
 neighbor VTEP peer-group
 neighbor VTEP remote-as 65000
 neighbor VTEP bfd
 neighbor 192.168.8.14 peer-group VTEP
 !
 address-family l2vpn evpn
  neighbor VTEP activate
  neighbor VTEP route-map MAP_VTEP_IN in
  neighbor VTEP route-map MAP_VTEP_OUT out
  advertise-all-vni
 exit-address-family
exit
!
router bgp 65000 vrf vrf_evpn1
 bgp router-id 192.168.8.9
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
exit
!
route-map MAP_VTEP_IN permit 1
exit
!
route-map MAP_VTEP_OUT permit 1
exit
!
line vty
!

ip nht resolve-via-default
 
Did you check your IP configuration? You have 192.168.8.9 in your peer list, but it seems like 192.168.8.10 is actually configured on proxmox1, judging from the neighbor output on proxmox1:

Code:
Local host: 192.168.8.10, Local port: 58960

How does the network config on proxmox1 look like?

Code:
ip a
 
Did you check your IP configuration? You have 192.168.8.9 in your peer list, but it seems like 192.168.8.10 is actually configured on proxmox1, judging from the neighbor output on proxmox1:

Code:
Local host: 192.168.8.10, Local port: 58960

How does the network config on proxmox1 look like?

Code:
ip a

On each proxmox node I have a physical interface dedicated to a function, a total of 4:

- First interface. proxmox web management.
- Second interface. Virtual machines.
- Third interface. EVPN
- Fourth interface. Corosync

In the proxmox1 node the vmbr0(eno1) interface with ip 192.168.8.10 is for web management. The enp1s0f1 interface with ip 192.168.8.9 is part of the peer for EVPN.

On proxmox node2 I have a physical interface for EVPN with ip 192.168.8.14 . Its management interface has IP 192.168.8.12 .

Code:
root@proxmox1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 3c:49:37:05:d5:b2 brd ff:ff:ff:ff:ff:ff
    inet 10.20.20.1/24 scope global enp1s0f0
       valid_lft forever preferred_lft forever
    inet6 fe80::3e49:37ff:fe05:d5b2/64 scope link
       valid_lft forever preferred_lft forever
3: enp1s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 3c:49:37:05:d5:b3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.8.9/24 scope global enp1s0f1
       valid_lft forever preferred_lft forever
    inet6 fe80::3e49:37ff:fe05:d5b3/64 scope link
       valid_lft forever preferred_lft forever
4: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 50:eb:f6:52:16:02 brd ff:ff:ff:ff:ff:ff
    altname enp5s0
5: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr1 state UP group default qlen 1000
    link/ether 50:eb:f6:52:16:03 brd ff:ff:ff:ff:ff:ff
    altname enp6s0
6: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 50:eb:f6:52:16:02 brd ff:ff:ff:ff:ff:ff
    inet 192.168.8.10/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::52eb:f6ff:fe52:1602/64 scope link
       valid_lft forever preferred_lft forever
7: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 50:eb:f6:52:16:03 brd ff:ff:ff:ff:ff:ff
    inet 192.168.8.11/24 scope global vmbr1
       valid_lft forever preferred_lft forever
    inet6 fe80::52eb:f6ff:fe52:1603/64 scope link
       valid_lft forever preferred_lft forever
39: vxlan_vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master vnet1 state UNKNOWN group default qlen 1000
    link/ether f6:f8:fa:01:d5:f0 brd ff:ff:ff:ff:ff:ff
40: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master vrf_evpn1 state UP group default qlen 1000
    link/ether bc:24:11:36:54:ab brd ff:ff:ff:ff:ff:ff
    inet6 fe80::be24:11ff:fe36:54ab/64 scope link
       valid_lft forever preferred_lft forever
41: vrf_evpn1: <NOARP,MASTER,UP,LOWER_UP> mtu 65575 qdisc noqueue state UP group default qlen 1000
    link/ether 72:46:03:97:6e:7b brd ff:ff:ff:ff:ff:ff
42: vrfvx_evpn1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master vrfbr_evpn1 state UNKNOWN group default qlen 1000
    link/ether 72:ff:68:90:55:31 brd ff:ff:ff:ff:ff:ff
43: vrfbr_evpn1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master vrf_evpn1 state UP group default qlen 1000
    link/ether 72:ff:68:90:55:31 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::70ff:68ff:fe90:5531/64 scope link
       valid_lft forever preferred_lft forever
 
The issue is very likely related to having the same subnet configured thrice on 3 different network interfaces, which leads to wrong source IP selection (and potentially even selecting the wrong interface, as three routes are created for the 192.168.8.0/24 subnet). You should never configure a subnet on more than one network interface. You need a separate subnet for each separate network interface, then the EVPN setup should work.
 
  • Like
Reactions: spirit
You're right. I have configured the three interfaces as active-backup bonding and on top of it a vbr0 bridge. I have the eVPN type SDN defined and now I can ping between the two virtual ones located on two proxmox nodes.

Thank you so much.