BGP-EVPN SDN and DNAT across multiple hypervisor hosts

alchemydc

New Member
Aug 5, 2022
9
3
3
Greetings,

I have set up a test environment for BGP EVPN SDN as follows:

* 3 hypervisor hosts running pve-manager/7.2-7/d0dd0e85.
* Each hypervisor has a public IP and is set up as an exit node with SNAT.
* Each hypervisor has a private IP that is used to create the BGP EVPN peering.

The guests are able to reach each other via their vnet1 addresses. They are also able to reach the Internet via the SNAT provided by each hypervisor.

There is a problem though, which is that guests that are running services that we want to expose to the Internet via DNAT (eg tcp/80, tcp/443) are only accessible via the public IP of the hypervisor node on which the guest is running.

For example, if guest-0 is running on host1, which has a DNAT rule as follows:
Code:
Chain PREROUTING (policy ACCEPT 886K packets, 59M bytes)
 pkts bytes target     prot opt in     out     source               destination
35032 1824K DNAT       tcp  --  *      *       0.0.0.0/0            a.b.c.d        tcp dpt:80 to:10.10.50.100:80

The service is reachable (as expected). However, if we migrate the guest-0 workload to host2, which has a DNAT rule as follows, the service is not reachable.
Code:
Chain PREROUTING (policy ACCEPT 191K packets, 12M bytes)
 pkts bytes target     prot opt in     out     source               destination
36371 1877K DNAT       tcp  --  *      *       0.0.0.0/0            e.f.g.h       tcp dpt:80 to:10.10.50.100:80

We do see the TCP SYN packets making it as far as the vrfbr_evpnzone on host2, but do not see the traffic on the vnet1 interface as expected.
Code:
$ host2> sudo tcpdump -n -i vrfbr_evpnzone port 80
17:24:10.694293 IP e.f.g.h.63662 > 10.10.50.100.80: Flags [S], seq 3934090983, win 65535, options [mss 1452,nop,wscale 6,nop,nop,TS val 1920979099 ecr 0,sackOK,eol], length 0
17:24:11.695297 IP e.f.g.h.63662 > 10.10.50.100.80: Flags [S], seq 3934090983, win 65535, options [mss 1452,nop,wscale 6,nop,nop,TS val 1920980100 ecr 0,sackOK,eol], length 0
17:24:12.695780 IP e.f.g.h.63662 > 10.10.50.100.80: Flags [S], seq 3934090983, win 65535, options [mss 1452,nop,wscale 6,nop,nop,TS val 1920981101 ecr 0,sackOK,eol], length 0

If we migrate guest0 back to host1, we see the service is reachable as expected, and the traffic is visible on the vnet1 interface:
Code:
$ host1> sudo tcpdump -n -i vnet1 port 80
17:31:54.832733 IP client_public_ip.63750 > 10.10.50.100.80: Flags [S], seq 625391463, win 65535, options [mss 1452,nop,wscale 6,nop,nop,TS val 2493782605 ecr 0,sackOK,eol], length 0
17:31:54.832822 IP 10.10.50.100.80 > client_public_ip.63750: Flags [S.], seq 1408185633, ack 625391464, win 65160, options [mss 1460,sackOK,TS val 2377761216 ecr 2493782605,nop,wscale 7], length 0
17:31:54.992287 IP client_public_ip.63750 > 10.10.50.100.80: Flags [.], ack 1, win 2070, options [nop,nop,TS val 2493782764 ecr 2377761216], length 0

TL;DR: BGP-EVPN SDN setup mostly working, but would like to be able to DNAT incoming connections to a guest from any hypervisor, not just the hypervisor that the guest is running on. Possible?


Diagram inline, relevant configs attached.
sdn_testnet_diagram.png
 

Attachments

Last edited:
mmm,I never have tested dnat with the gateway node.

The far future plan for nat 1:1, dnat 1:1 (or other central service) wil be to use some kind of vm inside the evpn network directly.

Maybe it can be possible to get it work from hypervisor directly, but I need to look how it's working between the vrf, and the routing.
I'll try to test on my side maybe next week.
 
Hi !
Wishing to create the same architecture with the DNAT, I would be interested to know if a solution exists for this problem.
Thank you all!
 
Hi !
Wishing to create the same architecture with the DNAT, I would be interested to know if a solution exists for this problem.
Thank you all!
No, AFAIK there is not yet any solution to make such an architecture work using DNAT straight to the workload. Keen to know if @spirit ever had time to look into it.

We worked around the constraint in our environment by exposing load balancers on each PVE node that route traffic to the SDN private IP of the workload, which is consistent and routable regardless of which PVE node the workload happens to be running on at any particular time.
 
No, AFAIK there is not yet any solution to make such an architecture work using DNAT straight to the workload. Keen to know if @spirit ever had time to look into it.

We worked around the constraint in our environment by exposing load balancers on each PVE node that route traffic to the SDN private IP of the workload, which is consistent and routable regardless of which PVE node the workload happens to be running on at any particular time.
Yes, sorry, I really didn't have time to look at it. Can you open a feature request bugzilla.proxmox.com to be sure it'll not be lost ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!