Implementations of SDN Networking

I m currently on holiday. Maybe try to add option exit node local routing (its available in gui). Its allow to reach evpn network from the main vrf ips on the exit node itself. (Ill do more tests when i ll be back )
 
Hi,
I'm back from holiday.

I have found a bug when 2 exit-nodes are used, with default route 0.0.0.0 is announced between the exit-nodes, so the exit-node never forward packet outside, but loop between the nodes. I'll try send a patch this week, if you have time to test it, it could be great.
 
Hi,

Code:
wget https://mutulin1.odiso.net/libpve-network-perl_0.7.0_all.deb
dpkg -i libpve-network-perl_0.7.0_all.deb

simply defined multiples exit-nodes, and if needed , define also a primary-exit-node.
 
fully updated my cluster.
and installed your libpve-network-perl over the existing one.

vtysh -c "sh bgp l2vpn evpn"
shows only the node im running the command from. seems like my 3node cluster cant see the others over evpn

from a client on that network i can ping itself and the gateway(exit node) but not the other test client on the same network.
migrating clients to the same host.. ping is working.

Another thing noticed is that i cant advertise via bgp to one of my routers any more. connection is up but nothing is passing on.

i have not had the time to go through future, but it seems like evpn not propagating and subnet is not advertised :(
 
from the external router i getting this
[Error] bgp_read_packet error: Connection reset by peer
 
do you have regenerate the frr config (apply button on sdn panel) after the install of the libpve-network package ?

What you describe is something like bgp sessions are not even established

just to be sure:

vtysh -c "sh bgp summary" ?

My patch is really small, it's only filtering [5]:[0]:[0]:[0.0.0.0] routes on exit-nodes coming from other exit-nodes

so, with vtysh -c "sh bgp l2vpn evpn",

you should at least see at minimum others nodes like
Code:
Route Distinguisher: x.x.x.x
*>i[3]:[0]:[32]:[x.x.x.x]


do you have tried to stop/start frr service ?
 
Last edited:
ive rebooted machines, ive reinstalled frr service.

noticed now that if bgp controllers is totally removed and then "apply" on the sdn panel.
then i add the controllers again but with new ASN for example, or change peer. what ever just make changes to /etc/pve/sdn/controllers.cfg from the gui and hit "apply" again. things is restarting according to gui..... but!

surprise old config still running with old asn... same goes if all controllers are removed totaly!
things is not restarted with config thats set from pve dir...
 
did a apt reinstall libpve-network-perl ifupdown2
apt purge and install on frr frr-pythontools

no luck, the frr config is not merged or updated
 
this is so squeezy did a setup for evpn controller.
things started to "work" as bad as before :)
also installed your fix again.


L2VPN EVPN Summary:
BGP router identifier x.x.x.x, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 7, using 1288 bytes of memory
Peers 2, using 1446 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
x.x.x.x 4 65000 0 0 0 0 0 never Active 0 N/A
x.x.x.x 4 65000 0 0 0 0 0 never Active 0 N/A

Total number of neighbors 2

so i guess things not happening over bgp as it should
 
ive rebooted machines, ive reinstalled frr service.

noticed now that if bgp controllers is totally removed and then "apply" on the sdn panel.
then i add the controllers again but with new ASN for example, or change peer. what ever just make changes to /etc/pve/sdn/controllers.cfg from the gui and hit "apply" again. things is restarting according to gui..... but!

surprise old config still running with old asn... same goes if all controllers are removed totaly!
things is not restarted with config thats set from pve dir...
mmm, the /etc/frr/frr.conf is only generated if controller && zone exist. (for". same goes if all controllers are removed totaly!"). But not removed on apply. (maybe I should improve this).

but if you do a change (asn change, peer change), it should be updated.

(note that you can use the apply sdn button, also without doing any change, it'll simply regenerate the current config).
 
this is so squeezy did a setup for evpn controller.
things started to "work" as bad as before :)
also installed your fix again.


L2VPN EVPN Summary:
BGP router identifier x.x.x.x, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 7, using 1288 bytes of memory
Peers 2, using 1446 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
x.x.x.x 4 65000 0 0 0 0 0 never Active 0 N/A
x.x.x.x 4 65000 0 0 0 0 0 never Active 0 N/A

Total number of neighbors 2

so i guess things not happening over bgp as it should
seem that the bgp session is not up.
(don't seem to be related to my patch, you also have the problem with before installing my patch, right ?)

is frr version same on all peers ?

do you have some info in /var/log/frr/frr.log ?


you can also enable debug logs with:

#vtysh

then

Code:
conf t
log syslog debug
debug bgp keepalives
debug bgp neighbor-events
debug bgp update-groups
debug bgp updates in
debug bgp updates out
debug bgp zebra
exit
exit
 
seem that the bgp session is not up.
(don't seem to be related to my patch, you also have the problem with before installing my patch, right ?)

is frr version same on all peers ?

do you have some info in /var/log/frr/frr.log ?


you can also enable debug logs with:

#vtysh

then

Code:
conf t
log syslog debug
debug bgp keepalives
debug bgp neighbor-events
debug bgp update-groups
debug bgp updates in
debug bgp updates out
debug bgp zebra
exit
exit
 

Attachments

  • frr.txt
    7.4 KB · Views: 4
seem that the bgp session is not up.
(don't seem to be related to my patch, you also have the problem with before installing my patch, right ?)

is frr version same on all peers ?

do you have some info in /var/log/frr/frr.log ?


you can also enable debug logs with:

#vtysh

then

Code:
conf t
log syslog debug
debug bgp keepalives
debug bgp neighbor-events
debug bgp update-groups
debug bgp updates in
debug bgp updates out
debug bgp zebra
exit
exit
same version on every host.
it worked before last time we where testing things. after that i just put everything on hold, removed config on proxmox-side.
new this time is that ive patched the servers to the latest versions and restart.
 
mmm, the /etc/frr/frr.conf is only generated if controller && zone exist. (for". same goes if all controllers are removed totaly!"). But not removed on apply. (maybe I should improve this).

but if you do a change (asn change, peer change), it should be updated.

(note that you can use the apply sdn button, also without doing any change, it'll simply regenerate the current config).
yes i did try to change asn and now it changes accordingly
 
ive tried a new "external" bgp router same issue.
cant establish connection


this is the error i get on the first "external" bgp router
2022-04-22T10:08:36Errorbgpd[EC 33554454] 10.0.10.200 [Error] bgp_read_packet error: Connection reset by peer
2022-04-22T10:08:36Errorbgpd[EC 33554454] 10.0.10.199 [Error] bgp_read_packet error: Connection reset by peer


and from one of the proxmox hosts
Apr 22 11:01:49 parker bgpd[1632074]: [T91AW-FGMHW] bgp_fsm_change_status : vrf default(0), Status: Active established_peers 0
Apr 22 11:01:49 parker bgpd[1632074]: [ZQHFG-DQGX1] 10.0.10.4 went from Idle to Active
Apr 22 11:01:49 parker bgpd[1632074]: [ZWCSR-M7FG9] 10.0.10.4 [FSM] TCP_connection_open (Active->OpenSent), fd 26
Apr 22 11:01:49 parker bgpd[1632074]: [VX6SM-8YE5W][EC 33554460] 10.0.10.4: nexthop_set failed, resetting connection - intf 0x0
Apr 22 11:01:49 parker bgpd[1632074]: [NQGZV-Y3W62][EC 100663299] bgp_connect_success: bgp_getsockname(): failed for peer 10.0.10.4, fd 26
Apr 22 11:01:49 parker bgpd[1632074]: [V1CHF-JSGRR] %NOTIFICATION: sent to neighbor 10.0.10.4 5/0 (Neighbor Events Error/Unspecific) 0 bytes
Apr 22 11:01:49 parker bgpd[1632074]: [ZWCSR-M7FG9] 10.0.10.4 [FSM] BGP_Stop (Active->Idle), fd 26
Apr 22 11:01:49 parker bgpd[1632074]: [T91AW-FGMHW] bgp_fsm_change_status : vrf default(0), Status: Deleted established_peers 0
Apr 22 11:01:49 parker bgpd[1632074]: [ZQHFG-DQGX1] 10.0.10.4 went from Active to Deleted
Apr 22 11:01:52 parker bgpd[1632074]: [JSYCM-MV07M] 0:Recv MACIP Del f 0x0 MAC e2:b0:a9:67:a6:9c IP VNI 10002 seq 0 state 1 ESI 00:00:00:00:00:00:00:00:00:00
Apr 22 11:01:58 parker bgpd[1632074]: [JFMSW-YMBC7] 10.0.10.4 [FSM] Timer (connect timer expire)
Apr 22 11:01:58 parker bgpd[1632074]: [ZWCSR-M7FG9] 10.0.10.4 [FSM] ConnectRetry_timer_expired (Active->Connect), fd -1
Apr 22 11:01:58 parker bgpd[1632074]: [T72VK-55DVG] 10.0.10.4 [FSM] Waiting for NHT
Apr 22 11:01:58 parker bgpd[1632074]: [T91AW-FGMHW] bgp_fsm_change_status : vrf default(0), Status: Connect established_peers 0
Apr 22 11:01:58 parker bgpd[1632074]: [ZQHFG-DQGX1] 10.0.10.4 went from Active to Connect
Apr 22 11:01:58 parker bgpd[1632074]: [JFMSW-YMBC7] 10.0.10.197 [FSM] Timer (connect timer expire)
Apr 22 11:01:58 parker bgpd[1632074]: [ZWCSR-M7FG9] 10.0.10.197 [FSM] ConnectRetry_timer_expired (Active->Connect), fd -1
Apr 22 11:01:58 parker bgpd[1632074]: [T72VK-55DVG] 10.0.10.197 [FSM] Waiting for NHT
Apr 22 11:01:58 parker bgpd[1632074]: [T91AW-FGMHW] bgp_fsm_change_status : vrf default(0), Status: Connect established_peers 0
Apr 22 11:01:58 parker bgpd[1632074]: [ZQHFG-DQGX1] 10.0.10.197 went from Active to Connect
Apr 22 11:01:58 parker bgpd[1632074]: [JFMSW-YMBC7] 10.0.10.200 [FSM] Timer (connect timer expire)
Apr 22 11:01:58 parker bgpd[1632074]: [ZWCSR-M7FG9] 10.0.10.200 [FSM] ConnectRetry_timer_expired (Active->Connect), fd -1
Apr 22 11:01:58 parker bgpd[1632074]: [T72VK-55DVG] 10.0.10.200 [FSM] Waiting for NHT
Apr 22 11:01:58 parker bgpd[1632074]: [T91AW-FGMHW] bgp_fsm_change_status : vrf default(0), Status: Connect established_peers 0
Apr 22 11:01:58 parker bgpd[1632074]: [ZQHFG-DQGX1] 10.0.10.200 went from Active to Connect
Apr 22 11:01:58 parker bgpd[1632074]: [ZWCSR-M7FG9] 10.0.10.4 [FSM] TCP_connection_open_failed (Connect->Active), fd -1
Apr 22 11:01:58 parker bgpd[1632074]: [T91AW-FGMHW] bgp_fsm_change_status : vrf default(0), Status: Active established_peers 0






cant find anything regarding bgp between proxmox
L2VPN EVPN Summary:
BGP router identifier 10.0.10.197, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 7, using 1288 bytes of memory
Peers 2, using 1446 KiB of memory
Peer groups 2, using 128 bytes of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.0.10.199 4 65000 0 0 0 0 0 never Active 0 N/A
10.0.10.200 4 65000 0 0 0 0 0 never Active 0 N/A




Hello, this is FRRouting (version 8.0.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

parker# show ip route
parker# show ip nht
10.0.10.4
unresolved
Client list: bgp(fd 16)
10.0.10.197
unresolved
Client list: bgp(fd 16)
10.0.10.200
unresolved
Client list: bgp(fd 16)
parker# show bgp nexthop
Current BGP nexthop cache:
10.0.10.4 invalid, #paths 0, peer 10.0.10.4
Last update: Fri Apr 22 10:13:18 2022

10.0.10.197 invalid, #paths 0, peer 10.0.10.197
Last update: Fri Apr 22 10:13:18 2022

10.0.10.200 invalid, #paths 0, peer 10.0.10.200
Last update: Fri Apr 22 10:13:18 2022
 
Last edited:
It's look like it don't even able to connect to remote peers.

no proxmox firewall blocking the bgp port ?

can you try : "telnet ip_of_peer 179" ?
telnet 10.0.10.200 179
Trying 10.0.10.200...
Connected to 10.0.10.200.
Escape character is '^]'.
ÿÿÿÿÿÿÿÿConnection closed by foreign host.



if i telnet to external bgp routers
its clean escape and closed... not all the "ÿ"

perhaps there is something up with that... generating "bgp_read_packet error: Connection reset by peer"
 
Last edited:
ok telnet confirm that connection can be established.

looking at yor logs, I found 2 suspicious line:


Code:
Apr 22 09:06:55 parker zebra[1597466]: [WVJCK-PPMGD][EC 4043309093] netlink-cmd (NS 0) error: Device or resource busy, type=RTM_GETROUTE(26), seq=5, pid=2594392672


Apr 22 11:01:49 parker bgpd[1632074]: [VX6SM-8YE5W][EC 33554460] 10.0.10.4: nexthop_set failed, resetting connection - intf 0x0

seem that frr is breaking on a spefic local network devices (could be physical , but also a tap|veth vm|ct interface)

Maybe this frr bug is related:
https://github.com/FRRouting/frr/issues/10404

can you send result of "vtysh -c "sh ip route" ?

on you proxmox host, do you use other vm|ct with nic on vlanaware bridge ? (in parallel to evpn vms)

I'll try to see if I can backport the frr patch.
 
I have build a frr 8.2.2 version + 3 patches from
https://github.com/FRRouting/frr/pull/10482

can you try it ?:

Code:
wget https://mutulin1.odiso.net/frr_8.2.2-1+pve1_amd64.deb
wget https://mutulin1.odiso.net/frr_8.2.2-1+pve1_amd64.deb
dpkg -i frr_8.2.2-1+pve1_amd64.deb frr_8.2.2-1+pve1_amd64.deb
systemctl restart frr
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!