proxmox 7.0 sdn beta test

Sounds like a bit workaround than a complex solution. In GCP Cloud they use loopback IP 169.254.169.254 for that which probably move packets to a DNS service in their network.
yes, but their dns service are also in vxlan/overlay,so it's basic routing.
You mean this ?

View attachment 45164

Exit Nodes local routing box?
yep.

It'll route traffic to outside through the exit(s) (node(s). (They you need a route in the reverse side, to be able to reach your evpn network. Or maybe you can enable s-nat on the subnet, to be natted on the exit node ip, so no need to reverse route).

you need 2 exit-nodes for redudancy, and primary exit node need to be defined for s-nat. (it's active/backup).



Note that if you have physical switch/routers supporting evpn, you can use them as exit-nodes instead proxmox nodes.

On my network, I have a pair of arista switches, used as exit node for evpn, and inter-vlan gateway for the legacy vlan network.
 
It'll route traffic to outside through the exit(s) (node(s). (They you need a route in the reverse side, to be able to reach your evpn network. Or maybe you can enable s-nat on the subnet, to be natted on the exit node ip, so no need to reverse route).

you need 2 exit-nodes for redudancy, and primary exit node need to be defined for s-nat. (it's active/backup).



Note that if you have physical switch/routers supporting evpn, you can use them as exit-nodes instead proxmox nodes.

On my network, I have a pair of arista switches, used as exit node for evpn, and inter-vlan gateway for the legacy vlan network.
Ok I'm confused. But why I need route anything when the EVPN with BGP on Mikrotik works ? So I already have routing. I already can enter from vm and vice versa to my personal computer in my LAN.
 
Ok I'm confused. But why I need route anything when the EVPN with BGP on Mikrotik works ? So I already have routing. I already can enter from vm and vice versa to my personal computer in my LAN.
Well, that's strange if you can already access to evpn from lan ^_^.

Is your mikrotik doing evpn ? Maybe it's acting as exit node, and forward traffic between your evpn network and lan ?

Basicaly, an exit-node announce special routes in evpn. (type5 routes), with for example 0.0.0.0, to said, if you want to reach this external subnet, you need foward traffic to the exit node.

you can check the type5 route on your proxmox node with:

Code:
vtysh -c "sh ip bgp l2vpn evpn" |grep "\[5"
 
Well, that's strange if you can already access to evpn from lan ^_^.
Why? I mean.. wasn't the the whole purpose of setting EVPN with my MikroTik using additional BGP controller from proxmox to mikrotik? If not than.. what we were doing the whole time here? :D
You remember the chart?

1666898334487-png.42662

it says there my ip 192.168.1.1 needs to access a custom network created on SDN using EVPN and Mikrotik (RouterOS 6 without evpn support, we talked about that too) and for that I needed extra BGP Controller.

So answering your next question
Is your mikrotik doing evpn ? Maybe it's acting as exit node, and forward traffic between your evpn network and lan ?

No, because it can't (RouterOS 6 remember?) so additional config layer

1672859381586.png

10.0.1.30 is mikrotik
10.0.1.1 is Proxmox host
and proxmox it's on EVPN controller as peer

1672859429992.png

so basically type5 is announced by Proxmox
Code:
root@pve:~# vtysh -c "sh ip bgp l2vpn evpn" |grep "\[5"
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
*> [5]:[0]:[0]:[0.0.0.0]
*> [5]:[0]:[0]:[::] 10.0.1.1(pve)                      32768 i

and as you see it is.
I think that DNS should work. As prove take a look

ping from VM on overlay network 10.0.101.1 => 10.0.1.1
works

Code:
ansible@srv-app-1:~$ ping -c 4 10.0.1.1
PING 10.0.1.1 (10.0.1.1) 56(84) bytes of data.
64 bytes from 10.0.1.1: icmp_seq=1 ttl=64 time=0.146 ms
64 bytes from 10.0.1.1: icmp_seq=2 ttl=64 time=0.224 ms
64 bytes from 10.0.1.1: icmp_seq=3 ttl=64 time=0.135 ms
64 bytes from 10.0.1.1: icmp_seq=4 ttl=64 time=0.174 ms

--- 10.0.1.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3049ms
rtt min/avg/max/mdev = 0.135/0.169/0.224/0.034 ms

but there is some issue with services when I want to check them by sending something on specific port on proxmox it's says that connection is refused.
For example, my laptop to DNS server that stands on Proxmox Host

Code:
❯ host unifi.sonic 10.0.1.1
Using domain server:
Name: 10.0.1.1
Address: 10.0.1.1#53
Aliases:

unifi.sonic is an alias for rasp-poz-1.hw.sonic.
rasp-poz-1.hw.sonic has address 10.255.0.20

There is a nice answer.

Now from VM on 10.0.101.0/24 network that is in the SDN type EVPN Network
Code:
ansible@srv-app-1:~$ host unifi.sonic 10.0.1.1
;; connection timed out; no servers could be reached

SSH to proxmox from VM also doesn't work

Code:
ansible@srv-app-1:~$ ssh 10.0.1.1
ssh: connect to host 10.0.1.1 port 22: Connection refused

I mean it looks like Proxmox isn't allowing VM to use their services.

But for example VM can reach unifi controller that stands on different VLAN on my Raspberry Pie in 10.255.0.0/24

Code:
ansible@srv-app-1:~$ curl -kIs https://10.255.0.20:8443
HTTP/1.1 302
Location: /manage
Transfer-Encoding: chunked
Date: Wed, 04 Jan 2023 19:25:19 GMT

There is a lot of entries in iptables -S
and Specific Chain called PVESIG-Reject

Code:
Chain PVEFW-Reject (0 references)
target     prot opt source               destination
PVEFW-DropBroadcast  all  --  anywhere             anywhere
ACCEPT     icmp --  anywhere             anywhere             icmp fragmentation-needed
ACCEPT     icmp --  anywhere             anywhere             icmp time-exceeded
DROP       all  --  anywhere             anywhere             ctstate INVALID
PVEFW-reject  udp  --  anywhere             anywhere             multiport dports 135,445
PVEFW-reject  udp  --  anywhere             anywhere             udp dpts:netbios-ns:139
PVEFW-reject  udp  --  anywhere             anywhere             udp spt:netbios-ns dpts:1024:65535
PVEFW-reject  tcp  --  anywhere             anywhere             multiport dports epmap,netbios-ssn,microsoft-ds
DROP       udp  --  anywhere             anywhere             udp dpt:1900
DROP       tcp  --  anywhere             anywhere             tcp flags:!FIN,SYN,RST,ACK/SYN
DROP       udp  --  anywhere             anywhere             udp spt:domain
           all  --  anywhere             anywhere             /* PVESIG:h3DyALVslgH5hutETfixGP08w7c */

There is DROP on 53(domain) at UDP protocol.

and btw

There is some issue with SNAT options

by checking SNAT option here
1672864018553.png
and disabling it after and of course applying between that process a network restart, the SNAT although it's not checked it's still active

Code:
root@pve:~# iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
SNAT       all  --  10.0.101.0/24        anywhere             to:10.0.1.1
root@pve:~#
 
Last edited:
Why? I mean.. wasn't the the whole purpose of setting EVPN with my MikroTik using additional BGP controller from proxmox to mikrotik? If not than.. what we were doing the whole time here? :D
You remember the chart?

1666898334487-png.42662

it says there my ip 192.168.1.1 needs to access a custom network created on SDN using EVPN and Mikrotik (RouterOS 6 without evpn support, we talked about that too) and for that I needed extra BGP Controller.
Sorry I don't remember the setup of everybody ^_^ (I think I have helped 20-30 differents evpn setup ^_^).


So answering your next question


No, because it can't (RouterOS 6 remember?) so additional config layer

View attachment 45229

10.0.1.30 is mikrotik
10.0.1.1 is Proxmox host
and proxmox it's on EVPN controller as peer

View attachment 45230

so basically type5 is announced by Proxmox
Code:
root@pve:~# vtysh -c "sh ip bgp l2vpn evpn" |grep "\[5"
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
*> [5]:[0]:[0]:[0.0.0.0]
*> [5]:[0]:[0]:[::] 10.0.1.1(pve)                      32768 i
Something is strange, the type5 route is only announced if you have enabled exit-node on the zone


and as you see it is.
I think that DNS should work. As prove take a look

ping from VM on overlay network 10.0.101.1 => 10.0.1.1
works

Code:
ansible@srv-app-1:~$ ping -c 4 10.0.1.1
PING 10.0.1.1 (10.0.1.1) 56(84) bytes of data.
64 bytes from 10.0.1.1: icmp_seq=1 ttl=64 time=0.146 ms
64 bytes from 10.0.1.1: icmp_seq=2 ttl=64 time=0.224 ms
64 bytes from 10.0.1.1: icmp_seq=3 ttl=64 time=0.135 ms
64 bytes from 10.0.1.1: icmp_seq=4 ttl=64 time=0.174 ms

--- 10.0.1.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3049ms
rtt min/avg/max/mdev = 0.135/0.169/0.224/0.034 ms
ok, so the routes && forwarding is ok.
Are you sure that you don't have an mtu problem ?
can you try with

ping -Mdo -s 1472 10.0.1.1

?

but there is some issue with services when I want to check them by sending something on specific port on proxmox it's says that connection is refused.
For example, my laptop to DNS server that stands on Proxmox Host

Code:
❯ host unifi.sonic 10.0.1.1
Using domain server:
Name: 10.0.1.1
Address: 10.0.1.1#53
Aliases:

unifi.sonic is an alias for rasp-poz-1.hw.sonic.
rasp-poz-1.hw.sonic has address 10.255.0.20

There is a nice answer.

Now from VM on 10.0.101.0/24 network that is in the SDN type EVPN Network
Code:
ansible@srv-app-1:~$ host unifi.sonic 10.0.1.1
;; connection timed out; no servers could be reached

if the previous ping test with -Mdo is working, then The only possiblity is a firewall rules somewhere, I really don't see other possibility
SSH to proxmox from VM also doesn't work
Code:
ansible@srv-app-1:~$ ssh 10.0.1.1
ssh: connect to host 10.0.1.1 port 22: Connection refused

I mean it looks like Proxmox isn't allowing VM to use their services.
This is expected, they are in different vrf. (proxmox ssh listen in the default vrf by default)

But for example VM can reach unifi controller that stands on different VLAN on my Raspberry Pie in 10.255.0.0/24

Code:
ansible@srv-app-1:~$ curl -kIs https://10.255.0.20:8443
HTTP/1.1 302
Location: /manage
Transfer-Encoding: chunked
Date: Wed, 04 Jan 2023 19:25:19 GMT

There is a lot of entries in iptables -S
and Specific Chain called PVESIG-Reject

Code:
Chain PVEFW-Reject (0 references)
target     prot opt source               destination
PVEFW-DropBroadcast  all  --  anywhere             anywhere
ACCEPT     icmp --  anywhere             anywhere             icmp fragmentation-needed
ACCEPT     icmp --  anywhere             anywhere             icmp time-exceeded
DROP       all  --  anywhere             anywhere             ctstate INVALID
PVEFW-reject  udp  --  anywhere             anywhere             multiport dports 135,445
PVEFW-reject  udp  --  anywhere             anywhere             udp dpts:netbios-ns:139
PVEFW-reject  udp  --  anywhere             anywhere             udp spt:netbios-ns dpts:1024:65535
PVEFW-reject  tcp  --  anywhere             anywhere             multiport dports epmap,netbios-ssn,microsoft-ds
DROP       udp  --  anywhere             anywhere             udp dpt:1900
DROP       tcp  --  anywhere             anywhere             tcp flags:!FIN,SYN,RST,ACK/SYN
DROP       udp  --  anywhere             anywhere             udp spt:domain
           all  --  anywhere             anywhere             /* PVESIG:h3DyALVslgH5hutETfixGP08w7c */

There is DROP on 53(domain) at UDP protocol.

and btw

There is some issue with SNAT options

by checking SNAT option here
View attachment 45234
and disabling it after and of course applying between that process a network restart, the SNAT although it's not checked it's still active

Code:
root@pve:~# iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
SNAT       all  --  10.0.101.0/24        anywhere             to:10.0.1.1
root@pve:~#
Snat is apply only on the exit node, not sure, but can you very than exit-node is enable or not ?
the s-nat iptables should be add in /etc/network/interfaces.d/sdn
 
Why? I mean.. wasn't the the whole purpose of setting EVPN with my MikroTik using additional BGP controller from proxmox to mikrotik? If not than.. what we were doing the whole time here? :D
You remember the chart?

1666898334487-png.42662

it says there my ip 192.168.1.1 needs to access a custom network created on SDN using EVPN and Mikrotik (RouterOS 6 without evpn support, we talked about that too) and for that I needed extra BGP Controller.
Sorry I don't remember the setup of everybody ^_^ (I think I have helped 20-30 differents evpn setup ^_^).


So answering your next question


No, because it can't (RouterOS 6 remember?) so additional config layer

View attachment 45229

10.0.1.30 is mikrotik
10.0.1.1 is Proxmox host
and proxmox it's on EVPN controller as peer

View attachment 45230

so basically type5 is announced by Proxmox
Code:
root@pve:~# vtysh -c "sh ip bgp l2vpn evpn" |grep "\[5"
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
*> [5]:[0]:[0]:[0.0.0.0]
*> [5]:[0]:[0]:[::] 10.0.1.1(pve)                      32768 i
Something is strange, the type5 route is only announced if you have enabled exit-node on the zone


and as you see it is.
I think that DNS should work. As prove take a look

ping from VM on overlay network 10.0.101.1 => 10.0.1.1
works

Code:
ansible@srv-app-1:~$ ping -c 4 10.0.1.1
PING 10.0.1.1 (10.0.1.1) 56(84) bytes of data.
64 bytes from 10.0.1.1: icmp_seq=1 ttl=64 time=0.146 ms
64 bytes from 10.0.1.1: icmp_seq=2 ttl=64 time=0.224 ms
64 bytes from 10.0.1.1: icmp_seq=3 ttl=64 time=0.135 ms
64 bytes from 10.0.1.1: icmp_seq=4 ttl=64 time=0.174 ms

--- 10.0.1.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3049ms
rtt min/avg/max/mdev = 0.135/0.169/0.224/0.034 ms
ok, so the routes && forwarding is ok.
Are you sure that you don't have an mtu problem ?
can you try with

ping -Mdo -s 1472 10.0.1.1

?

but there is some issue with services when I want to check them by sending something on specific port on proxmox it's says that connection is refused.
For example, my laptop to DNS server that stands on Proxmox Host

Code:
❯ host unifi.sonic 10.0.1.1
Using domain server:
Name: 10.0.1.1
Address: 10.0.1.1#53
Aliases:

unifi.sonic is an alias for rasp-poz-1.hw.sonic.
rasp-poz-1.hw.sonic has address 10.255.0.20

There is a nice answer.

Now from VM on 10.0.101.0/24 network that is in the SDN type EVPN Network
Code:
ansible@srv-app-1:~$ host unifi.sonic 10.0.1.1
;; connection timed out; no servers could be reached

if the previous ping test with -Mdo is working, then The only possiblity is a firewall rules somewhere, I really don't see other possibility
SSH to proxmox from VM also doesn't work
Code:
ansible@srv-app-1:~$ ssh 10.0.1.1
ssh: connect to host 10.0.1.1 port 22: Connection refused

I mean it looks like Proxmox isn't allowing VM to use their services.
This is expected, they are in different vrf. (proxmox ssh listen in the default vrf by default)

But for example VM can reach unifi controller that stands on different VLAN on my Raspberry Pie in 10.255.0.0/24

Code:
ansible@srv-app-1:~$ curl -kIs https://10.255.0.20:8443
HTTP/1.1 302
Location: /manage
Transfer-Encoding: chunked
Date: Wed, 04 Jan 2023 19:25:19 GMT

There is a lot of entries in iptables -S
and Specific Chain called PVESIG-Reject

Code:
Chain PVEFW-Reject (0 references)
target     prot opt source               destination
PVEFW-DropBroadcast  all  --  anywhere             anywhere
ACCEPT     icmp --  anywhere             anywhere             icmp fragmentation-needed
ACCEPT     icmp --  anywhere             anywhere             icmp time-exceeded
DROP       all  --  anywhere             anywhere             ctstate INVALID
PVEFW-reject  udp  --  anywhere             anywhere             multiport dports 135,445
PVEFW-reject  udp  --  anywhere             anywhere             udp dpts:netbios-ns:139
PVEFW-reject  udp  --  anywhere             anywhere             udp spt:netbios-ns dpts:1024:65535
PVEFW-reject  tcp  --  anywhere             anywhere             multiport dports epmap,netbios-ssn,microsoft-ds
DROP       udp  --  anywhere             anywhere             udp dpt:1900
DROP       tcp  --  anywhere             anywhere             tcp flags:!FIN,SYN,RST,ACK/SYN
DROP       udp  --  anywhere             anywhere             udp spt:domain
           all  --  anywhere             anywhere             /* PVESIG:h3DyALVslgH5hutETfixGP08w7c */

There is DROP on 53(domain) at UDP protocol.

and btw

There is some issue with SNAT options

by checking SNAT option here
View attachment 45234
and disabling it after and of course applying between that process a network restart, the SNAT although it's not checked it's still active

Code:
root@pve:~# iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
SNAT       all  --  10.0.101.0/24        anywhere             to:10.0.1.1
root@pve:~#
Snat is apply only on the exit node, not sure, but can you very than exit-node is enable or not ?
the s-nat iptables should be add in /etc/network/interfaces.d/sdn

you don't have any error when your are applying sdn config, with the network reload tasks ?
 
Something is strange, the type5 route is only announced if you have enabled exit-node on the zone

And I have, set the exit node. On Zone.

Are you sure that you don't have an mtu problem ?
Can't say. It's on 8000, isn't that to much considering VXLAN also involved here?

ping -Mdo -s 1472 10.0.1.1

Works


Snat is apply only on the exit node, not sure, but can you very than exit-node is enable or not ?
the s-nat iptables should be add in /etc/network/interfaces.d/sdn

You mean this?

1667129899907-png.42740

I already showed that it is set.

Sorry we have the issue with
1. Accessing VM to proxmox DNS service. (which now I made a workaround by setting a mikrotik - 10.0.1.30 dns resolver which forward all request from 10.101.0/24 VM network to Proxmox 10.0.1.1)
2. After changing VLAN from vmbr to bond after rebooting server there is issue with dhclient on bond0.11 , it can get IP address but works after some time (about ~1h) but then ping to SDN network doesn't work. I need to make manually if down bond0.11 && ifup bond0.11
 
And I have, set the exit node. On Zone.


Can't say. It's on 8000, isn't that to much considering VXLAN also involved here?



Works




You mean this?

1667129899907-png.42740

I already showed that it is set.

Sorry we have the issue with
1. Accessing VM to proxmox DNS service. (which now I made a workaround by setting a mikrotik - 10.0.1.30 dns resolver which forward all request from 10.101.0/24 VM network to Proxmox 10.0.1.1)
really, I don't known. If "ping -Mdo" is working, you don't have mtu problem. I don't see why dns is not working.
Maybe tcdpump will be needed, to see if it's blocking on the dns request or the dns response, and where.

2. After changing VLAN from vmbr to bond after rebooting server there is issue with dhclient on bond0.11 , it can get IP address but works after some time (about ~1h) but then ping to SDN network doesn't work. I need to make manually if down bond0.11 && ifup bond0.11
where do you have a dhclient ?

Code:
auto bond0.11
iface bond0.11 inet static
    mtu 8000
    address 10.0.1.1/30
    gateway 10.0.1.30
 
So I have a question that my Google-fu and forum/thread search has not been able to answer. When I try to create a Vnet for a zone of type Vlan, I am unable to create an untagged Vnet. I'm assuming this is by design, but want to confirm before I reconfigure my trunk ports between Proxmox and my physical switches. I have spent a couple hours search and trying cli commands, and trying to modify configs manually but cannot seem to do this.

Is it expected that a Vnet will always have a tag and if not, how do I create one untagged Vnet in my Vlan zone?? Thanks!!
 
So I have a question that my Google-fu and forum/thread search has not been able to answer. When I try to create a Vnet for a zone of type Vlan, I am unable to create an untagged Vnet. I'm assuming this is by design, but want to confirm before I reconfigure my trunk ports between Proxmox and my physical switches. I have spent a couple hours search and trying cli commands, and trying to modify configs manually but cannot seem to do this.

Is it expected that a Vnet will always have a tag and if not, how do I create one untagged Vnet in my Vlan zone?? Thanks!!
yes, you always need a tag. (It's mandatory in the gui && api, and code don't support untagged vnet)
 
Hi, Im currently experimenting in a test environment.
I have managed to set up vxlan to connect the VMs across hosts.
But how can I add the hosts to the overlay network?

I only have a single IP for the host and want to forward incoming traffic to the routing-vm no matter on which host it is running...

I tried adding an IP to the vxlan_vnet0 and vnet0 interface, but this does not work.
Or do I need a different approach altogether?
 
Hi, Im currently experimenting in a test environment.
I have managed to set up vxlan to connect the VMs across hosts.
But how can I add the hosts to the overlay network?

I only have a single IP for the host and want to forward incoming traffic to the routing-vm no matter on which host it is running...

I tried adding an IP to the vxlan_vnet0 and vnet0 interface, but this does not work.
Or do I need a different approach altogether?
I'm not sure with a simple vxlan zone, but maybe you could try to bridge your physical interface in vnet0

for example, you can try to add in /etc/network/interfaces

Code:
auto vnet0
iface vnet0 
    address  .....
    bridge-ports  eth0

it'll be merged with the sdn configuration.
 
Bridging did not work.
But after starting from scratch, adding the IP to vnet0 did work - I probably had some unclean state from previous tries.

And the tip with merging made things a lot easier :)
Thanks
 
where do you have a dhclient ?
@spirit
on proxmox on bond0.11

Code:
auto bond0.11
iface bond0.11 inet static
    mtu 8000
    address 10.0.1.1/30
    gateway 10.0.1.30

Yeah that setup changes in time. Right now I need to use static.

Btw I have another issue to resolve. It's not even an issue but problem which I want to deal with it. Probably you know something about K3s and MetalLB.
In short way what I want to achieve is having a LoadBalancer Service on Kubernetes (MetalLB) accessible from outside cluster - meaning from LAN network (192.168.x.x)
I have already one K3s instance on RasberryPi with exactly that setup.
Pi is in VLAN 11 , connected to Switch which os passing through VLAN that goes from MikroTik (rt-poz-1 - 10.0.1.30) and reaches the Pi with address 10.0.1.20.
K3s on Pi is a bare metal setup. So basically when installing MetalLB on it speaker reaches 10.0.1.20 of node IP and ipAddressPool has 10.0.100.1/24 network setup. Advertisement is made by BGP with MikroTik and works like a charm.

But things are getting more complicated when the node is a VM with k3s so another additional layer of abstraction. Plus that VM is on proxmox that has SDN defined (EVPN+BGP) which works as we already discussed here and you helped me . Thank you for that again.

So K3s on VM in subnet 10.0.10.0/24. ClusterIP 10.0.21.0/24 with ServiceIP 10.0.20.0/24. MetalLB on the other hand has 10.0.50.0/24.
Issue is with that network for LB (10.0.50.0/24). When I put a Nginx ingress on that cluster of course its available from VM where the k3s stands when doing a curl

Bash:
root@srv-app-1:/var/lib/rancher/k3s/server/manifests# curl -Is 10.0.50.1
HTTP/1.1 404 Not Found
Server: nginx/1.23.3
Date: Mon, 27 Feb 2023 08:45:47 GMT
Content-Type: text/html
Content-Length: 153
Connection: keep-alive

where 10.0.50.1 it's a Nginx ingress SVC
Code:
root@srv-app-1:/var/lib/rancher/k3s/server/manifests# kubectl get svc -o wide
NAME                    TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)                      AGE   SELECTOR
ingress-nginx-ingress   LoadBalancer   10.0.20.17   10.0.50.1     80:31373/TCP,443:32574/TCP   10h   app=ingress-nginx-ingress

From Proxmox Host this IP is unavailable and obviously from my LAN network.
I believe it's because of Proxmox SDN setup , there needs to be something here that makes a problem in connection.
How I advertised the 10.0.50.0/24 ? Just added BGPpeer object in k3s and connected with MikroTik to spread information about that network. Works to.

Code:
struct { Version uint8; ASN16 uint16; HoldTime uint16; RouterID uint32; OptsLen uint8 }{Version:0x4, ASN16:0xfde8, HoldTime:0xb4, RouterID:0xa00011e, OptsLen:0x16}
{"caller":"native.go:98","event":"sessionUp","level":"info","localASN":65000,"msg":"BGP session established","peer":"10.0.1.30:179","peerASN":65000,"ts":"2023-02-27T08:52:12Z"}

Code:
/routing bgp peer
add address-families=ip,l2vpn in-filter=pve-in name=peer1 out-filter=pve-out remote-address=10.0.1.1 remote-as=65000 ttl=default
add address-families=ip,l2vpn in-filter=pve-in name=metallb_rasp-poz-1 out-filter=pve-out remote-address=10.0.1.20 remote-as=65000 ttl=default
add address-families=ip,l2vpn in-filter=pve-in multihop=yes name=metallb_srv-app-1 out-filter=pve-out remote-address=10.0.10.1 remote-as=65000 ttl=default

[recover@Mike[1]] /routing bgp peer> /ip route pr
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, B - blackhole, U - unreachable, P - prohibit
 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 0 ADS  0.0.0.0/0                          109.173.xxx.xxx           1
 1 X S  ;;; WAN1
        0.0.0.0/0                          85.221.xxx.xxx            1
 2 A S  10.0.0.0/24                        10.255.253.2              1
 3 ADC  10.0.1.0/27        10.0.1.30       SRV_11                    0
 4 ADb  10.0.10.0/24                       10.0.1.1                200
 5 ADb  10.0.50.1/32                       10.0.10.1               200 <= This
 6 ADb  10.0.100.1/32                      10.0.1.20               200
 7 ADb  10.0.100.2/32                      10.0.1.20               200
 8 ADC  10.255.0.0/27      10.255.0.30     MGMT_10                   0
 9 ADC  10.255.253.0/24    10.255.253.1    InterVLAN                 0
10 ADC  10.255.255.0/24    10.255.255.1    IPMI                      0
11 ADC  109.173.xxx.xxx/25 109.173.xxx.xxx ether1                    0
12 A S  192.168.0.0/24                     10.255.253.2              1
13 A S  192.168.1.0/24                     10.255.253.2              1
14 ADC  192.168.14.0/28    192.168.14.14   GUEST                     0

To show it precisely I prepared a chart

proxmox_k3s-SDN-K8S.drawio.png
 
Last edited:
Can I use different VLANs/Subnets for EVPN and EBGP? How would such a config look.
For reference:
Two hosts with 10GB direct connected where I want EVPN/VXLAN on VLAN 1612/Subnet 172.16.12.0/24
On each Host two VLANs 2711 and 2712 connected via 1GBe to two upstream router on 172.27.11.0/24;172.27.12.0/24
Do I need to set up same/different AS on EVPN and BGP controllers, which IPs do I add to EVPN/BGP Neighbors?

I tried setting it up as I thought it should be correct (only one upstream router connected):
controllers.cfg
Code:
bgp: bgppve01
        asn 65002
        node pve01
        peers 172.27.11.1
        bgp-multipath-as-path-relax 0
        ebgp 1
        ebgp-multihop 2


bgp: bgppve02
        asn 65002
        node pve02
        peers 172.27.11.1
        bgp-multipath-as-path-relax 0
        ebgp 1
        ebgp-multihop 2

evpn: evpn
        asn 65002
        peers 172.16.12.103,172.16.12.104

zones.cfg:
Code:
evpn: vm
        controller evpn
        vrf-vxlan 201
        exitnodes pve01,pve02
        ipam pve
        mac DA:D1:F1:9C:22:59

vnets.cfg:
Code:
vnet: test
        zone vm
        tag 202

subnets.cfg:
Code:
subnet: vm-192.168.123.0-24
        vnet test
        gateway 192.168.123.1

The traffic between VMs works.
The traffic from VM to upstream works too and upstream sends return traffic but then the return traffic gets lost on the destination node like this (ping from 192.168.123.100/test-container to 1.1.1.1):
21:53:17.631229 veth100i0 P IP (tos 0x0, ttl 64, id 30545, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.123.100 > one.one.one.one: ICMP echo request, id 10721, seq 338, length 64
21:53:17.631241 fwln100i0 Out IP (tos 0x0, ttl 64, id 30545, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.123.100 > one.one.one.one: ICMP echo request, id 10721, seq 338, length 64
21:53:17.631242 fwpr100p0 P IP (tos 0x0, ttl 64, id 30545, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.123.100 > one.one.one.one: ICMP echo request, id 10721, seq 338, length 64
21:53:17.631242 test In IP (tos 0x0, ttl 64, id 30545, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.123.100 > one.one.one.one: ICMP echo request, id 10721, seq 338, length 64
21:53:17.631257 eno1.2711 Out IP (tos 0x0, ttl 63, id 30545, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.123.100 > one.one.one.one: ICMP echo request, id 10721, seq 338, length 64
21:53:17.631259 eno1 Out IP10 (invalid)

21:53:17.639803 vrfvx_vm P IP (tos 0x0, ttl 56, id 42731, offset 0, flags [none], proto ICMP (1), length 84)
one.one.one.one > 192.168.123.100: ICMP echo reply, id 10721, seq 338, length 64
21:53:17.639803 vrfbr_vm In IP (tos 0x0, ttl 56, id 42731, offset 0, flags [none], proto ICMP (1), length 84)
one.one.one.one > 192.168.123.100: ICMP echo reply, id 10721, seq 338, length 64


I can get it to work when I change the evpn to the same 1GB Interfaces where upstream is connected and use the same IP for it, but then I lose 10GB between guests
 
Can I use different VLANs/Subnets for EVPN and EBGP? How would such a config look.
For reference:
Two hosts with 10GB direct connected where I want EVPN/VXLAN on VLAN 1612/Subnet 172.16.12.0/24
On each Host two VLANs 2711 and 2712 connected via 1GBe to two upstream router on 172.27.11.0/24;172.27.12.0/24
Do I need to set up same/different AS on EVPN and BGP controllers, which IPs do I add to EVPN/BGP Neighbors?

I tried setting it up as I thought it should be correct (only one upstream router connected):
controllers.cfg
Code:
bgp: bgppve01
        asn 65002
        node pve01
        peers 172.27.11.1
        bgp-multipath-as-path-relax 0
        ebgp 1
        ebgp-multihop 2


bgp: bgppve02
        asn 65002
        node pve02
        peers 172.27.11.1
        bgp-multipath-as-path-relax 0
        ebgp 1
        ebgp-multihop 2

evpn: evpn
        asn 65002
        peers 172.16.12.103,172.16.12.104

zones.cfg:
Code:
evpn: vm
        controller evpn
        vrf-vxlan 201
        exitnodes pve01,pve02
        ipam pve
        mac DA:D1:F1:9C:22:59

vnets.cfg:
Code:
vnet: test
        zone vm
        tag 202

subnets.cfg:
Code:
subnet: vm-192.168.123.0-24
        vnet test
        gateway 192.168.123.1

The traffic between VMs works.
The traffic from VM to upstream works too and upstream sends return traffic but then the return traffic gets lost on the destination node like this (ping from 192.168.123.100/test-container to 1.1.1.1):
21:53:17.631229 veth100i0 P IP (tos 0x0, ttl 64, id 30545, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.123.100 > one.one.one.one: ICMP echo request, id 10721, seq 338, length 64
21:53:17.631241 fwln100i0 Out IP (tos 0x0, ttl 64, id 30545, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.123.100 > one.one.one.one: ICMP echo request, id 10721, seq 338, length 64
21:53:17.631242 fwpr100p0 P IP (tos 0x0, ttl 64, id 30545, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.123.100 > one.one.one.one: ICMP echo request, id 10721, seq 338, length 64
21:53:17.631242 test In IP (tos 0x0, ttl 64, id 30545, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.123.100 > one.one.one.one: ICMP echo request, id 10721, seq 338, length 64
21:53:17.631257 eno1.2711 Out IP (tos 0x0, ttl 63, id 30545, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.123.100 > one.one.one.one: ICMP echo request, id 10721, seq 338, length 64
21:53:17.631259 eno1 Out IP10 (invalid)

21:53:17.639803 vrfvx_vm P IP (tos 0x0, ttl 56, id 42731, offset 0, flags [none], proto ICMP (1), length 84)
one.one.one.one > 192.168.123.100: ICMP echo reply, id 10721, seq 338, length 64
21:53:17.639803 vrfbr_vm In IP (tos 0x0, ttl 56, id 42731, offset 0, flags [none], proto ICMP (1), length 84)
one.one.one.one > 192.168.123.100: ICMP echo reply, id 10721, seq 338, length 64


I can get it to work when I change the evpn to the same 1GB Interfaces where upstream is connected and use the same IP for it, but then I lose 10GB between guests
Hi, i m currently on holiday with lumtee connection, comeback next week. But yes, you Can use same asn for Bgp and evpn. About your problem, maybe IS it related to asymetric routing ? Maybe try to disable rp_filter sysctl.
 
Hi, i m currently on holiday with lumtee connection, comeback next week. But yes, you Can use same asn for Bgp and evpn. About your problem, maybe IS it related to asymetric routing ? Maybe try to disable rp_filter sysctl.
Hi spirit, thank you for the suggestion.

I checked the counter using nstat -rsz | grep IPReversePathFilter and indeed it was increasing.
sysctl -w net.ipv4.conf.all.rp_filter=0 made the packets flow, default is 2 (loose source)

However I don't understand yet why it does this.
ip r says:
Code:
default nhid 480 via 172.27.11.1 dev eno1.2711 proto bgp metric 20
172.16.11.0/24 dev vmbr0 proto kernel scope link src 172.16.11.104 # management IP
172.16.12.0/24 dev bond0.1612 proto kernel scope link src 172.16.12.104  # vxlan/cluster IP
172.27.11.0/24 dev eno1.2711 proto kernel scope link src 172.27.11.5 # upstream 1
172.27.12.0/24 dev eno2.2712 proto kernel scope link src 172.27.12.5 # upstream 2
192.168.123.0/24 nhid 472 dev test proto bgp metric 20 # test vnet/subnet
192.168.123.100 nhid 488 via 172.16.12.103 dev vrfbr_vm proto bgp metric 20 onlink # test container
192.168.123.102 nhid 488 via 172.16.12.103 dev vrfbr_vm proto bgp metric 20 onlink # test vm

Edit:
Ok I see, packets leave on pve01 and answer arrives on pve02, where it is routed to pve01 via vrfvx_vm which does not have any routes.
So I would have to announce bgp routes from pve02 on pve01 via evpn interface and vice-versa if I wanted to have working rp-filtering (and stateful firewall).
Is there anything I can configure to allow this?
 
That"s really strange indeed.



I don't known if it's a problem.. (maybe they "intercept" vxlan frame ??) . do you have tried with a simple switch ? (or a cross-cable between 2 servers).

For testing, maybe can you try to enable ipsec tunnel, to hide vxlan to your switches ?
https://pve.proxmox.com/pve-docs/chapter-pvesdn.html#_vxlan_ipsec_encryption

Would like into this once more. Is there any healthcheck between nodes possible?. Right now I see 0 packets for port 4789 between nodes. Also, the virtual network works on a single node, once I move a VM to the second node, the connectivity is lost :/
 
Would like into this once more. Is there any healthcheck between nodes possible?. Right now I see 0 packets for port 4789 between nodes. Also, the virtual network works on a single node, once I move a VM to the second node, the connectivity is lost :/
Hello!, using VXLAN, should I see a process listening on *:4789?, I don't see any on my nodes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!