proxmox 7.0 sdn beta test

spirit

Famous Member
Apr 2, 2010
5,153
484
103
www.odiso.com
Hello,

I am back at it again with another issue.

Yesterday I was having an issue where my PVE was unable to access a OpenID inside one of it's SDNs, where I realized it could not ssh to hosts inside a SDN Network but they could ssh to it. (Seems like the PVE packets never reached the vrf_evpnzone interface) So, before posting in the forms I saw that there were pakage updates that I can apply to my host so I tried that.

I applied the updates and now, my networks are unable to communicate outside of the host again. Specifically, the VMs in the 10.2.0.0/24 network can reach out, and the packets leave the PVE Host but when the PVE host recives the reply packets on it's interface, it does not forward those packets to the SDN network esentially causing no outside network connectivity. Please let me know what else is needed to help troubleshoot this.

dpkg -l|grep frr:
Code:
ii  frr                                  7.5.1-1.1                      amd64        FRRouting suite of internet protocols (BGP, OSPF, IS-IS, ...)
ii  frr-pythontools                      7.5.1-1.1                      all          FRRouting suite - Python tools
Hi,
I have detected bug in evpn in 7.5.1, where vnet is loosing arp, I have send a patch to pve-devel mailing list, but it's not yet applied.
https://lists.proxmox.com/pipermail/pve-devel/2021-August/049688.html
can you try theses packages:

https://mutulin1.odiso.net/frr_7.5.1-2+pve_amd64.deb
https://mutulin1.odiso.net/frr-pythontools_7.5.1-2+pve_all.deb

They are also another bug, not yet fixed upstream
https://lists.proxmox.com/pipermail/pve-devel/2021-July/049481.html

on the exit node, currently, they are a special route added in /etc/interfaces.d/sdn
"post-up ip route add vrf <yourvrf> unreachable default metric 4278198272"

it should be removed on the exit node, as it's blocking forwarding between the evpn vrf and the real network.
 

frybin

New Member
Jul 18, 2021
18
2
3
22
Hi,
I have detected bug in evpn in 7.5.1, where vnet is loosing arp, I have send a patch to pve-devel mailing list, but it's not yet applied.
https://lists.proxmox.com/pipermail/pve-devel/2021-August/049688.html
can you try theses packages:

https://mutulin1.odiso.net/frr_7.5.1-2+pve_amd64.deb
https://mutulin1.odiso.net/frr-pythontools_7.5.1-2+pve_all.deb

They are also another bug, not yet fixed upstream
https://lists.proxmox.com/pipermail/pve-devel/2021-July/049481.html

on the exit node, currently, they are a special route added in /etc/interfaces.d/sdn
"post-up ip route add vrf <yourvrf> unreachable default metric 4278198272"

it should be removed on the exit node, as it's blocking forwarding between the evpn vrf and the real network.
Hi @spirit,

So applying the packages I now have internet connectivity as before I had updated the system so thank you for that asstiance. Though it seems like the original problem that I was having is still there.
Here is a more concrete example of the problem below:
I have my PVE node which is also my exit node 10.10.0.3 and I have a VM which is inside a EVPN Subnet 10.2.0.12. From the VM I am able to ping and ssh into 10.10.0.3 but when I try to ssh into the VM from the PVE Node the traffic doesn't even seem to reach the vrf_evpnzone interface of the Node, atleast according to tcpdump but, I am able to ping the VM from the PVE Node just fine. And, sshing into the VM from my laptop is working just fine so thats not the issue. Main reason why this is an issue is because I am unable to have Proxmox use as of the services hosted inside the SDN network, such as SSO and DNS.
 

spirit

Famous Member
Apr 2, 2010
5,153
484
103
www.odiso.com
Hi @spirit,

So applying the packages I now have internet connectivity as before I had updated the system so thank you for that asstiance. Though it seems like the original problem that I was having is still there.
Here is a more concrete example of the problem below:
I have my PVE node which is also my exit node 10.10.0.3 and I have a VM which is inside a EVPN Subnet 10.2.0.12. From the VM I am able to ping and ssh into 10.10.0.3 but when I try to ssh into the VM from the PVE Node the traffic doesn't even seem to reach the vrf_evpnzone interface of the Node, atleast according to tcpdump but, I am able to ping the VM from the PVE Node just fine. And, sshing into the VM from my laptop is working just fine so thats not the issue. Main reason why this is an issue is because I am unable to have Proxmox use as of the services hosted inside the SDN network, such as SSO and DNS.
can you try "sysctl -w net.ipv4.tcp_l3mdev_accept=1" on the exit node ?

I known it's allowing vm to connect to exit node, I'm not sure for the reverse way.
 

frybin

New Member
Jul 18, 2021
18
2
3
22
can you try "sysctl -w net.ipv4.tcp_l3mdev_accept=1" on the exit node ?

I known it's allowing vm to connect to exit node, I'm not sure for the reverse way.
Running that command did not allow the exit node to connect to the vm.
 

frybin

New Member
Jul 18, 2021
18
2
3
22
are you able to ping your vm ip pour the exit node ?

what is the result of "ip route" on your exit node ?
I am able to sucessfully ping the VM ip from the exit node.

Results of ip route

Code:
root@pve3:~# ip route
default via 10.10.0.254 dev vmbr0 proto kernel onlink
10.2.0.0/24 nhid 9 dev myvnet1 proto bgp metric 20
10.3.0.0/24 nhid 4 dev myvnet2 proto bgp metric 20
10.10.0.0/24 dev vmbr0 proto kernel scope link src 10.10.0.3
10.100.0.0/24 dev eno1 proto kernel scope link src 10.100.0.3 linkdown
10.100.100.0/24 dev eno2 proto kernel scope link src 10.100.100.3 linkdown
 

spirit

Famous Member
Apr 2, 2010
5,153
484
103
www.odiso.com
I am able to sucessfully ping the VM ip from the exit node.

Results of ip route

Code:
root@pve3:~# ip route
default via 10.10.0.254 dev vmbr0 proto kernel onlink
10.2.0.0/24 nhid 9 dev myvnet1 proto bgp metric 20
10.3.0.0/24 nhid 4 dev myvnet2 proto bgp metric 20
10.10.0.0/24 dev vmbr0 proto kernel scope link src 10.10.0.3
10.100.0.0/24 dev eno1 proto kernel scope link src 10.100.0.3 linkdown
10.100.100.0/24 dev eno2 proto kernel scope link src 10.100.100.3 linkdown
I'Ill try to reproduce on my side, maybe they are some sysctl to tune. I'll keep in you in touch tomorrow.
 
  • Like
Reactions: frybin

spirit

Famous Member
Apr 2, 2010
5,153
484
103
www.odiso.com
ok I found how to do it.

"ip vrf exec vrf_<zonename> ssh root@<vmip>"

this will launch the ssh command in the vrf context.

This will work for vm located on the same node, because it'll use the anycast vnet address as source.

if you want to access from a node to vm located on another node,you need to add an ip on the node in

/etc/network/interfaces
Code:
auto vrfbr_<yourzonename>
iface vrfbr_<yourzonename>
        address <ip>
where ip can be a random ip, different than your subnets,
for simplicity, you can reuse your main host ip (no conflict as it's in a different vrf)


This can be done from any host of the cluster.


If it's work fine, I could add an option to add this ip address when config is generated.


Edit:
I have found another bug in frr 7.5, when restarting frr is loosing mac-ip evpn routes :( (difficult to reproduce,seem to be a race)
For now, I think it's better to use frr 7.4 package for proxmox6 (the deb is compatible)
I'll to see to force a rollback on proxmox7 until it's fixed.
 
Last edited:

frybin

New Member
Jul 18, 2021
18
2
3
22
ok I found how to do it.

"ip vrf exec vrf_<zonename> ssh root@<vmip>"

this will launch the ssh command in the vrf context.

This will work for vm located on the same node, because it'll use the anycast vnet address as source.

if you want to access from a node to vm located on another node,you need to add an ip on the node in

/etc/network/interfaces
Code:
auto vrfbr_<yourzonename>
iface vrfbr_<yourzonename>
        address <ip>
where ip can be a random ip, different than your subnets,
for simplicity, you can reuse your main host ip (no conflict as it's in a different vrf)


This can be done from any host of the cluster.


If it's work fine, I could add an option to add this ip address when config is generated.


Edit:
I have found another bug in frr 7.5, when restarting frr is loosing mac-ip evpn routes :( (difficult to reproduce,seem to be a race)
For now, I think it's better to use frr 7.4 package for proxmox6 (the deb is compatible)
I'll to see to force a rollback on proxmox7 until it's fixed.
So that ssh command in the vrf context worked but how would I be able to do that by default because I want to use the DNS and OpenID server that exists in the SDN Subnets what that node is the exit node for. From my understanding, adding to /etc/network/interfaces won't work for the same exit node right? Also, how would I downgrade frr to 7.4?
 
Last edited:

spirit

Famous Member
Apr 2, 2010
5,153
484
103
www.odiso.com
So that ssh command in the vrf context worked but how would I be able to do that by default because I want to use the DNS and OpenID server that exists in the SDN Subnets what that node is the exit node for.
mmm, ok, a little bit more complex ;).
I need to find a way to be able to reach them from defaut vrf.

(for the future, I'm looking to add support for exit gateway vms, outside proxmox host to avoid this)


From my understanding, adding to /etc/network/interfaces won't work for the same exit node right?
No, it's not a problem, ifupdown2 is merging conf if an interface is define twice.

Also, how would I downgrade frr to 7.4?
you use use deb from proxmox6, it's working fine on proxmox7 too
http://download.proxmox.com/debian/...cription/binary-amd64/frr_7.4-1+pve_amd64.deb
(I'm currently looking with proxmox dev for rollbacking it upstream)
 

spirit

Famous Member
Apr 2, 2010
5,153
484
103
www.odiso.com
ok, I have found a way with a veth pair, I need to test a little bit more, but here a spoiler

on the exitnode
Code:
 ip li add xvrf1 type veth peer name xvrf2
 ip li set xvrf1 up
 ip addr add dev xvrf1 10.200.3.1/30    #(could be any ip)
 ip li set xvrf2 master <vrf_zone> up
 ip addr add dev xvrf2 10.200.3.2/30 # (could be any ip)

ip ro add vrf  <vrf_zone>  <yourexitnodemainip>/32 via 10.200.3.1 dev xvrf2
ip ro add <yourevpnsbunet>/24 via 10.200.3.2 dev xvrf1

This works for me, from the exit node to a local vm. (I need to do more tests with vms on other nodes)
 

frybin

New Member
Jul 18, 2021
18
2
3
22
ok, I have found a way with a veth pair, I need to test a little bit more, but here a spoiler

on the exitnode
Code:
 ip li add xvrf1 type veth peer name xvrf2
 ip li set xvrf1 up
 ip addr add dev xvrf1 10.200.3.1/30    #(could be any ip)
 ip li set xvrf2 master <vrf_zone> up
 ip addr add dev xvrf2 10.200.3.2/30 # (could be any ip)

ip ro add vrf  <vrf_zone>  <yourexitnodemainip>/32 via 10.200.3.1 dev xvrf2
ip ro add <yourevpnsbunet>/24 via 10.200.3.2 dev xvrf1

This works for me, from the exit node to a local vm. (I need to do more tests with vms on other nodes)
So first of all, your spoiler ended up working for me which is really exciting.
In addition, trying to install the older version of frr, I get the following issue:
Code:
The following packages have unmet dependencies:
 frr : Depends: libjson-c3 (>= 0.11) but it is not installable
       Depends: libreadline7 (>= 6.0) but it is not installable
       Depends: libyang0.16 (>= 0.16.74) but it is not installable
Which it seems like these packages don't exist in the bullseye Debian repos and I am not sure if I should install them from the Buster Repo.
 

spirit

Famous Member
Apr 2, 2010
5,153
484
103
www.odiso.com
So first of all, your spoiler ended up working for me which is really exciting.
In addition, trying to install the older version of frr, I get the following issue:
Code:
The following packages have unmet dependencies:
 frr : Depends: libjson-c3 (>= 0.11) but it is not installable
       Depends: libreadline7 (>= 6.0) but it is not installable
       Depends: libyang0.16 (>= 0.16.74) but it is not installable
Which it seems like these packages don't exist in the bullseye Debian repos and I am not sure if I should install them from the Buster Repo.
ok,
here a 7.4 build (package with 7.5 to not be override by debian packages)

https://mutulin1.odiso.net/frr_7.5.1-99+pve~really7.4_amd64.deb
https://mutulin1.odiso.net/frr-pythontools_7.5.1-99+pve~really7.4_all.deb


About the exit-node config, can you try:

1)
remove exit-node option from zone config && reapply.
(don't forget to remove again "ip route del vrf vrf_<yourzonename> unreachable default metric 4278198272")

Edit:
after reapply conf, edit /etc/frr/frr.conf on the exit node
then at the end add

Code:
router bgp <bgpasn> vrf vrf_<zone>
 !
 address-family l2vpn evpn
  default-originate ipv4
  default-originate ipv6
 exit-address-family
!

then restart frr service

2)
add in /etc/network/interfaces

Code:
auto xvrf1
iface xvrf1
        link-type veth
        address 10.255.255.1/30
        veth-peer-name xvrf2


auto xvrf2
iface xvrf2
        address 10.255.255.2/30
        link-type veth
        veth-peer-name xvrf1
        vrf vrf_<yourzonename>
        post-up ip ro add 10.2.0.0/24 via 10.255.255.2 dev xvrf1
        post-up ip ro add 10.3.0.0/24 via 10.255.255.2 dev xvrf1

then "ifreload -a"

you should be able to ping + ssh to any vms (on any node) from the exit-node itself

if it's working for you, I can prepare a clean patch with an option to add in zones.cfg.
 
Last edited:

spirit

Famous Member
Apr 2, 2010
5,153
484
103
www.odiso.com
@frybin

I have done a patched pve-network deb for more easy testing

https://mutulin1.odiso.net/libpve-network-perl_0.6.1_all.deb

you just need to install it, then edit manually /etc/pve/sdn/zones.cfg,
and in your evpn zone config, add "exitnodes-local-routing 1" . (also keep "exitnodes" value)

ex:

/etc/pve/sdn/zones.cfg
Code:
evpn: evpnz
        controller evpn
        vrf-vxlan 1000
        exitnodes pve3
        exitnodes-local-routing 1
        ipam pve
        mac 02:02:3A:AE:F4:E4

then, use the apply button in the gui.
 

frybin

New Member
Jul 18, 2021
18
2
3
22
@frybin

I have done a patched pve-network deb for more easy testing

https://mutulin1.odiso.net/libpve-network-perl_0.6.1_all.deb

you just need to install it, then edit manually /etc/pve/sdn/zones.cfg,
and in your evpn zone config, add "exitnodes-local-routing 1" . (also keep "exitnodes" value)

ex:

/etc/pve/sdn/zones.cfg
Code:
evpn: evpnz
        controller evpn
        vrf-vxlan 1000
        exitnodes pve3
        exitnodes-local-routing 1
        ipam pve
        mac 02:02:3A:AE:F4:E4

then, use the apply button in the gui.
Hi @spirit,

First of all, thank you for all the help, I was able to install the 7.4 frr builds and the patched pve-network without an issue.
Second, it seems that the patched pve-network seemed to work just fine.
If I end up encountering any more issues I will be sure to post about them
 

frybin

New Member
Jul 18, 2021
18
2
3
22
So, I have another issue now but I am not sure 100% if this is caused by the SDN.
One thing that I am trying to do is set up an openshift cluster by following this guide.
The strange behavior that I am experiencing is when I set up the httpd server, and a master or worker tries to request their .ign file, the http request arrives at the httpd server, and the server sends out a reply but it never reaches the client. But, as long as the client is in another subnet, even another SDN subnet, then everything seems to work.
EX: 10.2.0.200(server), 10.2.0.201(client1),10.3.0.201(client2)
If client1 does
Code:
curl http://10.2.0.200:8080/okd4/master.ign
then the server receives the request, generates a response, but that response never reaches the client and you get a
Code:
curl: (56) Recv failure: Connection reset by peer
If client2 does
Code:
curl http://10.2.0.200:8080/okd4/master.ign
then the server receives the request, generates a response, and the response reaches the client and you get the file contents.
Please let me know what I can do to help with this issue.

Edit: This issue existed before I update any packages, with it being the main reason why I updated things, to begin with.
 
Last edited:

spirit

Famous Member
Apr 2, 2010
5,153
484
103
www.odiso.com
So, I have another issue now but I am not sure 100% if this is caused by the SDN.
One thing that I am trying to do is set up an openshift cluster by following this guide.
The strange behavior that I am experiencing is when I set up the httpd server, and a master or worker tries to request their .ign file, the http request arrives at the httpd server, and the server sends out a reply but it never reaches the client. But, as long as the client is in another subnet, even another SDN subnet, then everything seems to work.
EX: 10.2.0.200(server), 10.2.0.201(client1),10.3.0.201(client2)
If client1 does
Code:
curl http://10.2.0.200:8080/okd4/master.ign
then the server receives the request, generates a response, but that response never reaches the client and you get a
Code:
curl: (56) Recv failure: Connection reset by peer
If client2 does
Code:
curl http://10.2.0.200:8080/okd4/master.ign
then the server receives the request, generates a response, and the response reaches the client and you get the file contents.
Please let me know what I can do to help with this issue.

Edit: This issue existed before I update any packages, with it being the main reason why I updated things, to begin with.
are the 2 clients (on the 2 differents subnets), on 2 differents hypervisors ?
is the problem with http or https ? (with https/ssh il could be a fragmentation problem if mtu is too high)

can you try "ping -Mdo -s <mtusize> <otherserver" , where mtusize = (mtu in the vm - 30). (30 is the size of icmp message).

so, generally, if you have keeped mtu 1500 on your hypervisor physical interfaces, you should configure mtu 1450 inside your vm guest (because of vxlan encapsulation headers), and the ping test command should work mtu=1420.
 
Last edited:

frybin

New Member
Jul 18, 2021
18
2
3
22
are the 2 clients (on the 2 differents subnets), on 2 differents hypervisors ?
is the problem with http or https ? (with https/ssh il could be a fragmentation problem if mtu is too high)

can you try "ping -Mdo -s <mtusize> <otherserver" , where mtusize = (mtu in the vm - 30). (30 is the size of icmp message).

so, generally, if you have keeped mtu 1500 on your hypervisor physical interfaces, you should configure mtu 1450 inside your vm guest (because of vxlan encapsulation headers), and the ping test command should work mtu=1420.
It seems that the MTU was the issue. For the zone, I had the MTU set to 1450 but did not set my DHCP server to set the MTU on the VMs so they were defaulting to 1500, after setting the MTU to 1450 on the VMs, everything worked.
 
  • Like
Reactions: spirit

tisc0

Member
Jul 17, 2017
16
0
21
45
Hi guyz,
Here I am having some network trouble in a VM, having and losing ping regularly from it to wherever.

After re-checking all my setup and params, I dug a bit further, on the node, and found this :``


Sep 2 15:49:26 node_name kernel: [1393992.639299] vxlan_temp1: xx:xx:xx:83:0f:93 migrated from 172.15.0.3 to 172.15.0.1 Sep 2 15:49:29 node_name kernel: [1393992.xxxxxx] vxlan_temp1: xx:xx:xx:83:0f:93 migrated from 172.15.0.1 to 172.15.0.3

172.15. is my SDN peers network.
The node I'm on, having those logs, have 172.15.0.2, and no other node of the cluster are having those log lines (.0.1 nor .0.3).
I tried `ifreload -a` on the node, nothing changed.
I also could'nt find the mac address neither in `ip a` output, nor in any vm in the cluster (with a hair-pulled script using pvesh).

An idea ?
Thanks !

proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.4.128-1-pve: 5.4.128-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-network-perl: 0.6.0
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.5-pve1~bpo10+1
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!