proxmox 7.0 sdn beta test

spirit · Oct 26, 2020

fhloston said:
actually, to my understanding you don't. you can run all vxlan-ids in the same multicast group. it is counterproductive to spawn multicast groups per vxlan.

Default for linux is max. 20 igmp memberships...
net.ipv4.igmp_max_memberships = 20

ok you are right, I was looking at openstack && cloudstack implementation, and they are doing 1 multicast address by vxlan (they encode the vxlan id in multicast address), but it's possible to share it. I think I could simply add a multicast address option in the zone. I just need to find a way to define the physical interface. (could be different on each host). 1way could be to define the unicast subnet use by the interface. (don't known if users could have interfaces without any unicast ip address for the multicast/vxlan network?)

MrPowerGamerBR · Nov 9, 2020

Hey, I was testing SDN (with VXLAN) and while it works fine in one single machine, everything *breaks* after I add another machine.

After I added the other machine, connections weren't working... okay, so I allowed connections to "4789" on both machines (via "ufw allow 4789") and, after I did that, connections between the machines started working. Great! ...but not really.

Every time there is a "Next Hop" when pinging an VXLAN IP in a different machine (I was inside a container in Machine2, pinging a container in Machine1)

Code:

64 bytes from 10.0.0.2: icmp_seq=101 ttl=63 time=0.392 ms
64 bytes from 10.0.0.2: icmp_seq=102 ttl=63 time=0.383 ms
64 bytes from 10.0.0.2: icmp_seq=103 ttl=63 time=0.734 ms
64 bytes from 10.0.0.2: icmp_seq=104 ttl=63 time=0.352 ms
From 10.0.0.1: icmp_seq=105 Redirect Host(New nexthop: 10.0.0.2)
64 bytes from 10.0.0.2: icmp_seq=105 ttl=63 time=0.364 ms
From 10.0.0.1: icmp_seq=106 Redirect Host(New nexthop: 10.0.0.2)
64 bytes from 10.0.0.2: icmp_seq=106 ttl=63 time=0.369 ms
64 bytes from 10.0.0.2: icmp_seq=107 ttl=63 time=0.366 ms

And every service within the containers that are port forwarded to be accessible via the public IP (example: nginx, etc) breaks by constantly disconnecting everyone (probably every time there is a "nexthop"), example: If I have a Minecraft Server running, all players are disconnected randomly.

What could be the issue? This issue is fixed if I remove the ufw rule, but then I can't communicate between the two machines (as it should be, because I removed the allowed connections to the port)

SDN in Machine1

Code:

#version:42

auto vxlan1
iface vxlan1
        address 10.0.0.1/8
        bridge_ports vxlan_vxlan1
        bridge_stp off
        bridge_fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        mtu 9950

auto vxlan_vxlan1
iface vxlan_vxlan1
        vxlan-id 100000
        vxlan_remoteip Machine2IP
        mtu 9950

SDN in Machine2

Code:

#version:42

auto vxlan1
iface vxlan1
        address 10.0.0.1/8
        bridge_ports vxlan_vxlan1
        bridge_stp off
        bridge_fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        mtu 9950

auto vxlan_vxlan1
iface vxlan_vxlan1
        vxlan-id 100000
        vxlan_remoteip Machine1IP
        mtu 9950

(The reason I'm using 9950 as the MTU is because I was having issues with PostgreSQL clients hanging forever with a 1450 MTU, not sure why that was happening)

jlebherz · Nov 9, 2020

Hey,
You shouldn't add an IP on the Bridge, wen using VXLAN, this option is only for EVPN...
Please don't use a MTU bigger than 9000 and make sure, that the switch is capable for jumbo frames

MrPowerGamerBR · Nov 9, 2020

jlebherz said:
Hey,
You shouldn't add an IP on the Bridge, wen using VXLAN, this option is only for EVPN...
Please don't use a MTU bigger than 9000 and make sure, that the switch is capable for jumbo frames

When you mean "IP on the bridge", do you mean the "address 10.0.0.1/8" part? If yes, then how can I connect to my VMs/LXC containers from the Proxmox host?

Thanks for helping

, I will try decreasing the MTU, I only increased it because when using 1450 any PostgreSQL clients (example: PostgreSQL db in a container and a app that connects to the PostgreSQL db) seems to freeze, increasing the MTU (or *changing* the MAC/IP of the container) fixed the issue.

jlebherz · Nov 9, 2020

You could use a router VM, or use VRRP (keepalived) to assign the IP to one of the hosts.
If you have this IP on both hosts on the bridge, you have two times the same ip in a Layer 2 network...

please check your switches for the max MTU size...

MrPowerGamerBR · Nov 9, 2020

jlebherz said:
You could use a router VM, or use VRRP (keepalived) to assign the IP to one of the hosts.
If you have this IP on both hosts on the bridge, you have two times the same ip in a Layer 2 network...

please check your switches for the max MTU size...

I'm not sure if I'm going to be able to check the MTU size, because... well, I'm not the one that has the switch.

One of the machines are hosted at OVH, another is hosted at SoYouStart.

Also, the issue that I've mentioned (PostgreSQL failing) is on a container *on the same machine* as the application. I will try reproducing this issue on my test Proxmox VMs, but happened is:

Machine 1 has SDN (with VXLAN) configured.

PostgreSQL container is on Machine 1
Application container (connects to PostgreSQL) is also on Machine 1

Application tries to connect to PostgreSQL database
Application gets stuck until I:
1. Change the PostgreSQL container IP
or
2. Change the PostgreSQL container MAC
or
3. Increase the MTU size

While I can connect with the "psql" client on the Application container, trying to autocomplete (via TAB) freezes the client. This doesn't happen if I do any of the changes above. Also didn't have this issue when I was using a "normal" NAT (by adding the interface on the "/etc/network/interfaces", etc etc etc)

Of course, this may be my dedicated server having wonky configs (which would make it hard to reproduce on a test VM). Because I'm already having issues with "ifupdown2" requiring me to switch ip_forward to 0 and then 1 every time I apply a new SDN network change (if I don't, I can't access any of the port forwarded applications via the external network)

jlebherz · Nov 9, 2020

Hey, how are you establishing the Connection between this Servers?
do you have a VPN running or are you just using the external IPs? if so, please think about security, because VXLAN has NO encryption!!!
You can not jost make the MTU size bigger without looking at your path between the hosts!
If you use 1450 mtu size, you have to set your network card inside the vms also to that size and then you shouldn't have trouble with your PostgreSQL...

MrPowerGamerBR · Nov 9, 2020

jlebherz said:
Hey, how are you establishing the Connection between this Servers?
do you have a VPN running or are you just using the external IPs? if so, please think about security, because VXLAN has NO encryption!!!
You can not jost make the MTU size bigger without looking at your path between the hosts!
If you use 1450 mtu size, you have to set your network card inside the vms also to that size and then you shouldn't have trouble with your PostgreSQL...

Yes, currently it is over an external IP. And yes, I know that you should encrypt your transport layer when using VXLAN, I was already doing that because I was communicating between my VM/containers via the public IP (by port forwarding)

About the network card: Currently I'm using LXC containers to test it out, so it shouldn't have any issues, right? After all, there isn't a way to change the MTU size in the LXC container network settings. :/

EDIT: Tried removing the "address 10.0.0.1/8" from the sdn configuration, that breaks networking and I'm not able to ping containers/VMs in another dedi machine

jlebherz · Nov 9, 2020

of course, you can change the mtu of a lxc container ;-) https://pve.proxmox.com/wiki/Linux_Container#pct_settings

create a vm with two interfaces, one on the vxlan bridge, one on the host bridge and then use this vm as a router between the host and the vms

MrPowerGamerBR · Nov 9, 2020

jlebherz said:
of course, you can change the mtu of a lxc container ;-) https://pve.proxmox.com/wiki/Linux_Container#pct_settings

create a vm with two interfaces, one on the vxlan bridge, one on the host bridge and then use this vm as a router between the host and the vms

Thank you! I found out that after I thought "well *maybe* there is a way to change it" and decided to search on Google. I did that change and it worked fine :3

I also fixed the "Next Hop": The issue is that I was using IPs with "/24" for containers (example: "10.0.11.1/24" instead of "10.0.11.1/8"), after I changed to "/8" the issue was fixed! Now I'm checking if the "users disconnected randomly" issue will go away.

EDIT: Nah, sadly it didn't fix the issue, after a while all connections are disconnected.

spirit · Nov 9, 2020

hi,

if you want to have routable vxlan(with an ip on the bridge, used as gateway for the vm),you need to use evpn zone. (bgp-evpn is able to manage anycast gateway, same ip on differents nodes)

with simple vxlan zone, you'll have duplicate ip for the gateway, so random packets loss.

Next version of sdn feature will have different gui, where you'll not be able to setup ip on vxlan vnet.

about mtu, you shouldn't increase on vxlan interfaces. you can increase mtu on physical interfaces, like 1556 (if you can), or decrease mtu on vxlan interface to 1444 (and in vms/ct).

about encryption, it should come when ifupdown2 will add support for macsec feature.

MrPowerGamerBR · Nov 10, 2020

spirit said:
hi,

if you want to have routable vxlan(with an ip on the bridge, used as gateway for the vm),you need to use evpn zone. (bgp-evpn is able to manage anycast gateway, same ip on differents nodes)

with simple vxlan zone, you'll have duplicate ip for the gateway, so random packets loss.

Next version of sdn feature will have different gui, where you'll not be able to setup ip on vxlan vnet.

about mtu, you shouldn't increase on vxlan interfaces. you can increase mtu on physical interfaces, like 1556 (if you can), or decrease mtu on vxlan interface to 1444 (and in vms/ct).

about encryption, it should come when ifupdown2 will add support for macsec feature.

Okay, so I decided to try it again today:

I have a NAT (10.0.0.1/8) on my dedicated machines, this is used for internet connectivity. (and what I was using for connections between containers/VMs in the same dedicated server)

(Before I was using the 10.0.0.1 net for VXLAN *and* internet connectivity, which is what I think was causing issues, because I wasn't able to route packets if there wasn't a IPv4/CIDR set)

I created a VXLAN zone with MTU 1450
I created a Virtual Net with tag 10000 and it is VLAN aware (and I DIDN'T put the IPv4/CIDR address, just like you said)

On my LXC containers, I appended the mtu setting, so it looks like this:

Code:

net1: name=eth1,bridge=vxnet1,firewall=1,hwaddr=D2:6B:FA:59:C1:49,ip=172.16.0.2/12,type=veth,mtu=1450

...and it looks like it is working fine now. But I'm still testing and I haven't tested with something that requires a persistent connection (currently testing with nginx -> internal webapp)

Will report if the issue persists after I try it with something that uses a persistent connection

Your message about "random packet loss" actually makes a ton of sense, because the players in the server were complaining that the "world was taking too long to load compared to before", which maybe was because there was having packet loss in the connection

spirit · Nov 16, 2020

@Ben B @David Hooton

I have added in coming patch the possibity to define controller for each node. (So you'll be able to define differents asn,peers,... for each node).
I have also added support for bfd && loopback source for zone. (if you do bgp-multipath-ecmp underlay)

Also, for custom/complex bgp setup, it'll possible to define the zone without explicit controller, so you'll be able to define your own frr.conf.

It take a little bit time to push the new version, because they are a lot of new features for subnet,ipam,dns management. I hope it'll come soon.

spirit · Nov 18, 2020

@David Hooton
I miss your question
"In an ideal world we would be able to do a type 5 EVPN interface for the management interface. Being able to define prefix filter lists on type 5 routing tables is a definite must have as we don't want hypervisor chassis announcing themselves as default gateways."

Currently, the hypervisor don't announce the type5, until you define them as exit gateway.
But each vnet, is an anycast gateway for the vm. (without any type5)
it's something like:
vm--->vnet anycast ip-----type5 route default->exit gateway(can be an external evpn router or a proxmoxnode defined as exitgateway)

jlebherz · Nov 22, 2020

Hey there,

why isn't it possible to use a tag twice within different Zones?

I wanted to use a tag in 2 different VXLAN zones, bot theres an error

Code:

create sdn vnet object failed: error during cfs-locked 'file-sdn__version' operation: tag 99 already exist in vnet x99 at /usr/share/perl5/PVE/Network/SDN/VnetPlugin.pm line 115. (500)

would be very great if that is possible, because if not it would be verry difficult to make sure, not hawing tags twice.
we have several proxmox clusters and want to manage them via the api, there should be different VXLAN zones, one for each cluster and also one for inter-communication between the clusters.

spirit · Nov 22, 2020

jlebherz said:
Hey there,

why isn't it possible to use a tag twice within different Zones?

I wanted to use a tag in 2 different VXLAN zones, bot theres an error

Code:

create sdn vnet object failed: error during cfs-locked 'file-sdn__version' operation: tag 99 already exist in vnet x99 at /usr/share/perl5/PVE/Network/SDN/VnetPlugin.pm line 115. (500)

would be very great if that is possible, because if not it would be verry difficult to make sure, not hawing tags twice.

for vxlan it's impossible. (it's a technical limitation of current linux kernel implementation).

For vlan it's possible. (but I don't remember if I have already allowed it in this version. I'm sure it's allowed in coming version).

we have several proxmox clusters and want to manage them via the api, there should be different VXLAN zones, one for each cluster and also one for inter-communication between the clusters.

what is the problem exactly ?
if you want to isolated the traffic between both cluster, you can even use same vxlan id on each cluster, but use differents peers.

jlebherz · Nov 22, 2020

the problem is, that i have to check all clusters for existing tags when I want to create a new inter-communication vnet..
well yes its possible to do this, just wanted to ask...

thanks spirit

ps: the sdn feature is very great

Matthieu Le Corre · Nov 30, 2020

Hello,
We've updated our test cluster to 6.3 and it seems that ipam would available for test.
I've not been able to see any piece of code regarding to ipam, am I missing something ?

We also have another strange bug, after clicking a zone in the tree view, nothing happen, and moving to another items produce a scrambled display.

Capture d’écran de 2020-11-30 09-50-56.png

The browser console also show 2 errors

Capture d’écran de 2020-11-30 09-51-12.png

By the way, SDN is definitly great, I can't wait seen it in production !

spirit · Nov 30, 2020

Hi, ipam && new features are not yet published. I think it should come soon. (I have already send all patches to the dev mailist list).

I'm aware about the zone bug on the tree, it's already fixed in the coming version. (thanks for the report).

About Ipam, it's not yet integrated in lxc/qemu nic form, this is the last part I need to finish.

Matthieu Le Corre · Nov 30, 2020

ok, I'll wait for the next version.

We are using EfficientIP as IPam, so as soon as it will be available,
I'll surely take some time to make this IPam available for proxmox.

proxmox 7.0 sdn beta test

Distinguished Member

Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Member

Distinguished Member

Member

Distinguished Member

Distinguished Member

Well-Known Member

Distinguished Member

Well-Known Member

Renowned Member

Distinguished Member

Renowned Member

We value your privacy