does PVE SDN support multiple identicle subnets in multiple zones?

imoniker

Member
Aug 28, 2023
32
2
8
Hello,

does PVE support multiple identical subnets (and VMs with same IP) in multiple zones ?
ex.
Zone 1: EVPN 65001, vNet1: 65101, subnet 10.10.10.1/24 VM100: 10.10.10.10
Zone 2: EVPN 65002, vNet2: 65102, subnet 10.10.10.1/24 VM101: 10.10.10.10
It seems with the same subnet, VMs cannot ping gateway (10.10.10.1) even if the two VMs have different IP (10.10.10.10 and 10.10.10.11)
 
What I wanted to test is NAT for same subnet (and same IP), for example:

External IP:192.168.10.10:Internal :Zone 1: EVPN 65001, vNet1: 65101, subnet 10.10.10.1/24 VM100: 10.10.10.10
External IP:192.168.10.11:Internal :Zone 2: EVPN 65002, vNet2: 65102, subnet 10.10.10.1/24 VM101: 10.10.10.10

I want external IP could be pinged from LAN 192.168.10.1/24,internal IP (10.10.10.10 in zone1&zone2) could ping 192.168.10.1/24

I searched on google, and it seems connmark is needed :

mangle table:
iptables -t mangle -A PREROUTING -i {vrfA_incmoing_intf} -s {private_ip} -d 0.0.0.0/0 -j CONNMARK --set-mark 10
iptables -t mangle -A PREROUTING -s 0.0.0.0/0 -d {vrfA_server_public_ip} -j CONNMARK --set-mark 11
Same for VRFB but using two different connmark values
iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark

NAT table:
iptables -t nat -A PREROUTING -m connmark --mark 11 -j DNAT --to-destination {private_ip}
iptables -t nat -A POSTROUTING -m connmark --mark 10 -j SNAT --to-source {server_public_ip}

ip rules:
ip rule add fwmark 10 lookup {vrfA_table_id}
ip rule add fwmark 11 lookup {vrfA_table_id}

Does PVE support this kind of configuration? if not, what should I do? Should I write my own "exit node" conf files?
 
For evpn, a different vrf with a different routing table is used for each zone, so I think it could work. (but I never tested it).

Do you have tested to enable 1exit-node (where this node have physical access to 192.168.10.1) + enable s-nat on the subnet ?

I'm not sure about how is working the conntrack with multiple vrf and same ips in different vrf.

Edit:

Found an interesting article
https://blog.oddbit.com/post/2023-02-19-vrf-and-nat/

Seem that it need to implement different conntrack zone. I don't think it's done currently.

(can you open a bugzilla.proxmox.com, I'll try to work on it in coming weeks)
 
Last edited:
For evpn, a different vrf with a different routing table is used for each zone, so I think it could work. (but I never tested it).

Do you have tested to enable 1exit-node (where this node have physical access to 192.168.10.1) + enable s-nat on the subnet ?

I'm not sure about how is working the conntrack with multiple vrf and same ips in different vrf.

Edit:

Found an interesting article
https://blog.oddbit.com/post/2023-02-19-vrf-and-nat/

Seem that it need to implement different conntrack zone. I don't think it's done currently.

(can you open a bugzilla.proxmox.com, I'll try to work on it in coming weeks)
I'll try it and open a bugzilla later, after I finish some test.
By the way, I found when SNAT is enabled, each time I do a "systemctl reload networking" on PVE host, it will add a new POSTROUTING entry in iptables. Is it a bug?

Also, sometimes, after some configuration changes, I can't ping vNet gateway from guest VM, and I have to edit VM network device to bring it back(e.g. un-select firewall, click ok and select firewall, click ok). I can't reproduce it for the moment, but it happens.
 
By the way, I found when SNAT is enabled, each time I do a "systemctl reload networking" on PVE host, it will add a new POSTROUTING entry in iptables. Is it a bug?
yes, currently it's done with a simple "post-up iptables ...", so reexecuting same command each time.
It need to be polished. (Maybe managed by a service similar to pve-firewall)


Also, sometimes, after some configuration changes, I can't ping vNet gateway from guest VM, and I have to edit VM network device to bring it back(e.g. un-select firewall, click ok and select firewall, click ok). I can't reproduce it for the moment, but it happens.
It could be great to known what change exactly is not working without edit vm device. (Maybe it's a bug, maybe not)


Basically, the sdn reload don't touch where the current vm interface is plugged

For the delete of a vnet for example, the vnet should not be removed until a vm is still plugged on it.

If you change/add a vnet gateway ip, it should works out of the box.
 
yes, currently it's done with a simple "post-up iptables ...", so reexecuting same command each time.
It need to be polished. (Maybe managed by a service similar to pve-firewall)



It could be great to known what change exactly is not working without edit vm device. (Maybe it's a bug, maybe not)


Basically, the sdn reload don't touch where the current vm interface is plugged

For the delete of a vnet for example, the vnet should not be removed until a vm is still plugged on it.

If you change/add a vnet gateway ip, it should works out of the box.
ok, I could reproduce it, I made a "systemctl restart networking".
 
For evpn, a different vrf with a different routing table is used for each zone, so I think it could work. (but I never tested it).

Do you have tested to enable 1exit-node (where this node have physical access to 192.168.10.1) + enable s-nat on the subnet ?

I'm not sure about how is working the conntrack with multiple vrf and same ips in different vrf.

Edit:

Found an interesting article
https://blog.oddbit.com/post/2023-02-19-vrf-and-nat/

Seem that it need to implement different conntrack zone. I don't think it's done currently.

(can you open a bugzilla.proxmox.com, I'll try to work on it in coming weeks)
I succeeded in solving this problem using conntrack. Here are my steps:

Test Environment:One PVE host(10.30.2.50),2 Zones:
- Zone1,65001,vnet1 65101,subnet1:10.10.22.1/24,VM1:10.10.22.10;
- Zone2,65001,vnet2 65102,subnet1:10.10.22.1/24,VM1:10.10.22.10; Subnet2:10.10.43.1/24,VM2:10.10.43.10

Step 0:create zones、vnets、subnets、vms,and enable exit node.

Step 1:Create IP 10.30.2.161 and 10.30.2.161
Code:
iface vmbr0:0 inet static
        address 10.30.2.161/24
iface vmbr0:1 inet static
        address 10.30.2.162/24

Step 2:Add the following Iptable rules:
Code:
iptables -t mangle -A PREROUTING -i vrf_z01 -s 10.10.22.10 -d 0.0.0.0/0 -j CONNMARK --set-mark 12
iptables -t mangle -A PREROUTING -s 0.0.0.0/0 -d 10.30.2.161 -j CONNMARK --set-mark 13
iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
iptables -t nat -A PREROUTING -m connmark --mark 13 -j DNAT --to-destination 10.10.22.10
iptables -t nat -A POSTROUTING -m connmark --mark 12 -j SNAT --to-source 10.30.2.161
ip rule add fwmark 12 lookup 1001 prio 102
ip rule add fwmark 13 lookup 1001 prio 103

iptables -t mangle -A PREROUTING -i vrf_z02 -s 10.10.22.10 -d 0.0.0.0/0 -j CONNMARK --set-mark 14
iptables -t mangle -A PREROUTING -s 0.0.0.0/0 -d 10.30.2.162 -j CONNMARK --set-mark 15
iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
iptables -t nat -A PREROUTING -m connmark --mark 15 -j DNAT --to-destination 10.10.22.10
iptables -t nat -A POSTROUTING -m connmark --mark 14 -j SNAT --to-source 10.30.2.162
ip rule add fwmark 14 lookup 1002 prio 104
ip rule add fwmark 15 lookup 1002 prio 105

Step 3:comment out the following codes generated by PVE SDN in frr.conf and run systemctl restart frr
Code:
frr version 8.5.1
frr defaults datacenter
hostname pvev
log syslog informational
service integrated-vtysh-config
!
!
vrf vrf_z01
 vni 65001
# ip route 10.10.22.1/24 null0
# ip route 10.10.43.1/24 null0
# ip route 10.10.43.1/24 null0
exit-vrf
!
vrf vrf_z02
 vni 65002
# ip route 10.10.22.1/24 null0
# ip route 10.10.43.1/24 null0
exit-vrf
!
vrf vrf_z03
 vni 65003
# ip route 10.10.22.1/24 null0
# ip route 10.10.22.1/24 null0
# ip route 10.10.43.1/24 null0
exit-vrf
!
router bgp 65000
 bgp router-id 10.30.2.50
...
 !
 address-family ipv4 unicast
#  import vrf vrf_z01
#  import vrf vrf_z02
#  import vrf vrf_z03
 exit-address-family
 !
 address-family ipv6 unicast
#  import vrf vrf_z01
#  import vrf vrf_z02
#  import vrf vrf_z03
 exit-address-family
 !
 address-family l2vpn evpn
 ...

Result:
1. external terminal could ping zone1 10.10.22.10 or zone2 10.10.22.10 through 10.30.2.161 or 10.30.2.162
2. 10.10.22.10 in two zones have correct internet access.
3. VMs in zone2 could ping each other without problem.
4. VM in zone1 could not ping VMs in zone2.

I think this article is very informative:https://blog.oddbit.com/post/2023-02-19-vrf-and-nat/
However, I think it suggested another solution, such as implementing a router VM inside a zone to route traffic, or something like namespace in Openstack.

Also, I think this solution might be better than the current exit node implementation, as
1. there is no need to add many "ip route xxx null0" rules in frr.conf, especially when there are lots of subnets and zones.
2. VRF route information doesn't need to "leak" to the host, the host's route table is clean.
3. Host doesn't have route information to access VRF subnets. It might be more secure.
Is it so?

BTW, is it normal that I have to reset VM's network device (e.g. un-select firewall, click ok and select firewall, click ok) to make the VM could ping its gateway after a "systemctl restart networking" on the host?

Bugzilla ticket:https://bugzilla.proxmox.com/show_bug.cgi?id=4980
 
Last edited:
thanks for all the infos, I'll ty to see how I can improve this.
first step is really to move the management of the nat of a specific daemon, I think in the pve-firewall code to have atomic iptables updates.


About the routing vm, yes, they are plan for the future to implement this.
https://bugzilla.proxmox.com/show_bug.cgi?id=3382


I'm currently busy with dhcp && ipam implementation, but I'll try to work on this in coming months.



BTW, is it normal that I have to reset VM's network device (e.g. un-select firewall, click ok and select firewall, click ok) to make the VM could ping its gateway after a "systemctl restart networking" on the host?
Yes, never use "systemctl restart networking", only use reload. (systemctl reload networking or ifreload -a). vm interfaces are not managed in /etc/network/interfaces, so restart networking don't replug them in bridge.
 
thanks for all the infos, I'll ty to see how I can improve this.
first step is really to move the management of the nat of a specific daemon, I think in the pve-firewall code to have atomic iptables updates.


About the routing vm, yes, they are plan for the future to implement this.
https://bugzilla.proxmox.com/show_bug.cgi?id=3382


I'm currently busy with dhcp && ipam implementation, but I'll try to work on this in coming months.




Yes, never use "systemctl restart networking", only use reload. (systemctl reload networking or ifreload -a). vm interfaces are not managed in /etc/network/interfaces, so restart networking don't replug them in bridge.
Hi spirit, is there any update about this topic?
How to have same subnet in different zones and route/nat it outside?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!