Poor performance in applying sdn setting if number of zones is large

imoniker

Member
Aug 28, 2023
32
2
8
Dear, I'm doing some pressure testing in SDN.
I created 100 zones, each had a vnet and a subnet in /etc/pve/sdn/*.cfg
when I issued a "pvesh set /cluster/sdn", it took 9 minutes to apply sdn settings.
Even if I launched this command a second time without any change to *.cfg, it still took 9 minutes.
systemctl reload networking or systemctl reload frr is fast, and took several seconds.

What I expect is, adding a new zone takes less time, for example, several seconds.
What should I do to reduce the time needed to add a new zone/vnet/subnet?
 
I found a way to bypass this:
1. Generate /etc/network/interfaces.d/sdn, /etc/frr/frr.conf manually.
2. reload network and frr.
3. use qm create to create VM.
4. start VM.
So far it works. Does this procedure have any problem?
 
mmm,that's strange ...

when you reload sdn through the gui or "pvesh set /cluster/sdn", it's create a main task "networkreloadall", then in this task it's doing a ssh connection to each node, one by one, then it's calling

pvesh set /nodes/<nodename>/network. (It's like the local network reload button).

This create a new task "reloading network" for each node.

This task is generated /etc/network/interfaces.d/sdn && /etc/frr/frr/conf , then reloading both, with:

# ifreload -a for network
# /usr/lib/frr/frr-reload.py /etc/frr/frr.conf --reload --stdout



Do you have any error in the tasks log ?
does the local tasks are launching fast just after the networkrelloadall ? (if not it could be a ssh problem)


is "ifreload -a" running fast ? (you can have debug log if ifreload -a -d)


The only thing which can be slow is generally the ifreload (but it seem fast with systemctl reload networking, which is calling ifreload, so ...)
because of some slow scripts in /etc/network/if-up.d/ or /etc/network/pre-up.d/ . (you can remove them or try to edit /etc/network/ifupdown2/ifupdown2.conf -> addon_scripts_support=0 à
 
I've made a mistake. "systemctl reload networking" did take almost the same time as "pvesh set /cluster/sdn".
addon_scripts_support=0 helped a lot, it's about 25 seconds other than 9 minutes before.
I'll test 1000 zones tomorrow. ^_^
By the way, what is the purpose of scripts in /etc/network/if-up.d/ or /etc/network/pre-up.d/ ...? It seems we could bypass them.
 
some scripts are mostly for ifupdown1 (if you want to rollback )

# ls /etc/network/if-up.d/
bridgevlan bridgevlanport chrony ethtool mtu postfix
# ls /etc/network/if-pre-up.d/
bridge ethtool openvswitch vlan


But they are already skipped by ifupdown2

Some others could be deployed by other software packages (and not skipped, executed for each vnet).
What is your content of your scripts directory ?

you should see with "ifreload -a -d" the execution of the scripts.

(could be interesting to see if we could add in proxmox a skip for a slow script)
 
ifreload -a -d with addon_scripts_support=1 for one VRF/vNet/subnet

Bash:
debug: vrf_z03: found dependents ['vnet3', 'vrfbr_z03']
debug: vnet3: found dependents ['vxlan_vnet3']
info: vxlan_vnet3: running ops ...
debug: vxlan_vnet3: pre-up : running module xfrm
debug: vxlan_vnet3: pre-up : running module link
debug: vxlan_vnet3: pre-up : running module bond
debug: vxlan_vnet3: pre-up : running module vlan
debug: vxlan_vnet3: pre-up : running module vxlan
info: vxlan_vnet3: vxlan already exists - no change detected
info: executing /sbin/bridge fdb del 00:00:00:00:00:00 dev vxlan_vnet3  self dst 10.30.2.50
debug: vxlan_vnet3: pre-up : running module usercmds
debug: vxlan_vnet3: pre-up : running module bridge
info: vnet3: applying bridge port configuration: ['vxlan_vnet3']
debug: vxlan_vnet3: pre-up : running module bridgevlan
debug: vxlan_vnet3: pre-up : running module tunnel
debug: vxlan_vnet3: pre-up : running module vrf
debug: vxlan_vnet3: pre-up : running module ethtool
debug: vxlan_vnet3: pre-up : running module auto
debug: vxlan_vnet3: pre-up : running module address
info: executing /sbin/sysctl net.mpls.conf.vxlan_vnet3.input=0
debug: vxlan_vnet3: up : running module dhcp
debug: vxlan_vnet3: up : running module address
debug: vxlan_vnet3: up : running module addressvirtual
debug: vxlan_vnet3: up : running module usercmds
debug: vxlan_vnet3: up : running script /etc/network/if-up.d/chrony
info: executing /etc/network/if-up.d/chrony
debug: vxlan_vnet3: up : running script /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/postfix
debug: vxlan_vnet3: post-up : running module usercmds
debug: vxlan_vnet3: statemanager sync state pre-up
info: vnet3: running ops ...
debug: vnet3: pre-up : running module xfrm
debug: vnet3: pre-up : running module link
debug: vnet3: pre-up : running module bond
debug: vnet3: pre-up : running module vlan
debug: vnet3: pre-up : running module vxlan
debug: vnet3: pre-up : running module usercmds
debug: vnet3: pre-up : running module bridge
info: vnet3: bridge already exists
info: vnet3: applying bridge settings
info: vnet3: reset bridge-hashel to default: 4
info: reading '/sys/class/net/vnet3/bridge/stp_state'
info: vnet3: netlink: ip link set dev vnet3 type bridge (with attributes)
debug: attributes: {26: 4}
debug: vnet3: evaluating port expr '['vxlan_vnet3']'
info: vnet3: port vxlan_vnet3: already processed
info: vnet3: applying bridge configuration specific to ports
info: vnet3: processing bridge config for port vxlan_vnet3
debug: vnet3: pre-up : running module bridgevlan
debug: vnet3: pre-up : running module tunnel
debug: vnet3: pre-up : running module vrf
debug: vnet3: pre-up : running module ethtool
debug: vnet3: pre-up : running module auto
debug: vnet3: pre-up : running module address
info: executing /sbin/sysctl net.mpls.conf.vnet3.input=0
info: vnet3: bridge inherits mtu from its ports. There is no need to assign mtu on a bridge
info: writing '1' to file /proc/sys/net/ipv4/conf/vnet3/arp_accept
debug: vnet3: up : running module dhcp
debug: vnet3: up : running module address
debug: vnet3: up : running module addressvirtual
debug: vnet3: up : running module usercmds
debug: vnet3: up : running script /etc/network/if-up.d/chrony
info: executing /etc/network/if-up.d/chrony
debug: vnet3: up : running script /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/postfix
debug: vnet3: post-up : running module usercmds
debug: vnet3: statemanager sync state pre-up
debug: vrfbr_z03: found dependents ['vrfvx_z03']
info: vrfvx_z03: running ops ...
debug: vrfvx_z03: pre-up : running module xfrm
debug: vrfvx_z03: pre-up : running module link
debug: vrfvx_z03: pre-up : running module bond
debug: vrfvx_z03: pre-up : running module vlan
debug: vrfvx_z03: pre-up : running module vxlan
info: vrfvx_z03: vxlan already exists - no change detected
debug: vrfvx_z03: pre-up : running module usercmds
debug: vrfvx_z03: pre-up : running module bridge
info: vrfbr_z03: applying bridge port configuration: ['vrfvx_z03']
debug: vrfvx_z03: pre-up : running module bridgevlan
debug: vrfvx_z03: pre-up : running module tunnel
debug: vrfvx_z03: pre-up : running module vrf
debug: vrfvx_z03: pre-up : running module ethtool
debug: vrfvx_z03: pre-up : running module auto
debug: vrfvx_z03: pre-up : running module address
info: executing /sbin/sysctl net.mpls.conf.vrfvx_z03.input=0
debug: vrfvx_z03: up : running module dhcp
debug: vrfvx_z03: up : running module address
debug: vrfvx_z03: up : running module addressvirtual
debug: vrfvx_z03: up : running module usercmds
debug: vrfvx_z03: up : running script /etc/network/if-up.d/chrony
info: executing /etc/network/if-up.d/chrony
debug: vrfvx_z03: up : running script /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/postfix
debug: vrfvx_z03: post-up : running module usercmds
debug: vrfvx_z03: statemanager sync state pre-up
info: vrfbr_z03: running ops ...
debug: vrfbr_z03: pre-up : running module xfrm
debug: vrfbr_z03: pre-up : running module link
debug: vrfbr_z03: pre-up : running module bond
debug: vrfbr_z03: pre-up : running module vlan
debug: vrfbr_z03: pre-up : running module vxlan
debug: vrfbr_z03: pre-up : running module usercmds
debug: vrfbr_z03: pre-up : running module bridge
info: vrfbr_z03: bridge already exists
info: vrfbr_z03: applying bridge settings
info: vrfbr_z03: reset bridge-hashel to default: 4
info: reading '/sys/class/net/vrfbr_z03/bridge/stp_state'
info: vrfbr_z03: netlink: ip link set dev vrfbr_z03 type bridge (with attributes)
debug: attributes: {26: 4}
debug: vrfbr_z03: evaluating port expr '['vrfvx_z03']'
info: vrfbr_z03: port vrfvx_z03: already processed
info: vrfbr_z03: applying bridge configuration specific to ports
info: vrfbr_z03: processing bridge config for port vrfvx_z03
debug: vrfbr_z03: evaluating port expr '['vrfvx_z03']'
debug: vrfbr_z03: _get_bridge_mac returned (None, None)
debug: vrfbr_z03: pre-up : running module bridgevlan
debug: vrfbr_z03: pre-up : running module tunnel
debug: vrfbr_z03: pre-up : running module vrf
debug: vrfbr_z03: pre-up : running module ethtool
debug: vrfbr_z03: pre-up : running module auto
debug: vrfbr_z03: pre-up : running module address
info: executing /sbin/sysctl net.mpls.conf.vrfbr_z03.input=0
info: vrfbr_z03: bridge inherits mtu from its ports. There is no need to assign mtu on a bridge
debug: vrfbr_z03: up : running module dhcp
debug: vrfbr_z03: up : running module address
debug: vrfbr_z03: up : running module addressvirtual
debug: vrfbr_z03: up : running module usercmds
debug: vrfbr_z03: up : running script /etc/network/if-up.d/chrony
info: executing /etc/network/if-up.d/chrony
debug: vrfbr_z03: up : running script /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/postfix
debug: vrfbr_z03: post-up : running module usercmds
debug: vrfbr_z03: statemanager sync state pre-up
info: vrf_z03: running ops ...
debug: vrf_z03: pre-up : running module xfrm
debug: vrf_z03: pre-up : running module link
debug: vrf_z03: pre-up : running module bond
debug: vrf_z03: pre-up : running module vlan
debug: vrf_z03: pre-up : running module vxlan
debug: vrf_z03: pre-up : running module usercmds
debug: vrf_z03: pre-up : running module bridge
debug: vrf_z03: pre-up : running module bridgevlan
debug: vrf_z03: pre-up : running module tunnel
debug: vrf_z03: pre-up : running module vrf
debug: vrf_z03: pre-up : running module ethtool
debug: vrf_z03: pre-up : running module auto
debug: vrf_z03: pre-up : running module address
info: executing /sbin/sysctl net.mpls.conf.vrf_z03.input=0
debug: vrf_z03: up : running module dhcp
debug: vrf_z03: up : running module address
debug: vrf_z03: up : running module addressvirtual
debug: vrf_z03: up : running module usercmds
debug: vrf_z03: up : running script /etc/network/if-up.d/chrony
info: executing /etc/network/if-up.d/chrony
debug: vrf_z03: up : running script /etc/network/if-up.d/postfix
info: executing /etc/network/if-up.d/postfix
debug: vrf_z03: post-up : running module usercmds
info: executing ip route add vrf vrf_z03 unreachable default metric 4278198272
debug: vrf_z03: statemanager sync state pre-up

ifreload -a -d with addon_scripts_support=0 for one VRF/vNet/subnet

Code:
debug: vrf_z03: found dependents ['vnet3', 'vrfbr_z03']
debug: vnet3: found dependents ['vxlan_vnet3']
info: vxlan_vnet3: running ops ...
debug: vxlan_vnet3: pre-up : running module xfrm
debug: vxlan_vnet3: pre-up : running module link
debug: vxlan_vnet3: pre-up : running module bond
debug: vxlan_vnet3: pre-up : running module vlan
debug: vxlan_vnet3: pre-up : running module vxlan
info: vxlan_vnet3: vxlan already exists - no change detected
info: executing /sbin/bridge fdb del 00:00:00:00:00:00 dev vxlan_vnet3  self dst 10.30.2.50
debug: vxlan_vnet3: pre-up : running module usercmds
debug: vxlan_vnet3: pre-up : running module bridge
info: vnet3: applying bridge port configuration: ['vxlan_vnet3']
debug: vxlan_vnet3: pre-up : running module bridgevlan
debug: vxlan_vnet3: pre-up : running module tunnel
debug: vxlan_vnet3: pre-up : running module vrf
debug: vxlan_vnet3: pre-up : running module ethtool
debug: vxlan_vnet3: pre-up : running module auto
debug: vxlan_vnet3: pre-up : running module address
info: executing /sbin/sysctl net.mpls.conf.vxlan_vnet3.input=0
debug: vxlan_vnet3: up : running module dhcp
debug: vxlan_vnet3: up : running module address
debug: vxlan_vnet3: up : running module addressvirtual
debug: vxlan_vnet3: up : running module usercmds
debug: vxlan_vnet3: post-up : running module usercmds
debug: vxlan_vnet3: statemanager sync state pre-up
info: vnet3: running ops ...
debug: vnet3: pre-up : running module xfrm
debug: vnet3: pre-up : running module link
debug: vnet3: pre-up : running module bond
debug: vnet3: pre-up : running module vlan
debug: vnet3: pre-up : running module vxlan
debug: vnet3: pre-up : running module usercmds
debug: vnet3: pre-up : running module bridge
info: vnet3: bridge already exists
info: vnet3: applying bridge settings
info: vnet3: reset bridge-hashel to default: 4
info: reading '/sys/class/net/vnet3/bridge/stp_state'
info: vnet3: netlink: ip link set dev vnet3 type bridge (with attributes)
debug: attributes: {26: 4}
debug: vnet3: evaluating port expr '['vxlan_vnet3']'
info: vnet3: port vxlan_vnet3: already processed
info: vnet3: applying bridge configuration specific to ports
info: vnet3: processing bridge config for port vxlan_vnet3
debug: vnet3: pre-up : running module bridgevlan
debug: vnet3: pre-up : running module tunnel
debug: vnet3: pre-up : running module vrf
debug: vnet3: pre-up : running module ethtool
debug: vnet3: pre-up : running module auto
debug: vnet3: pre-up : running module address
info: executing /sbin/sysctl net.mpls.conf.vnet3.input=0
info: vnet3: bridge inherits mtu from its ports. There is no need to assign mtu on a bridge
info: writing '1' to file /proc/sys/net/ipv4/conf/vnet3/arp_accept
debug: vnet3: up : running module dhcp
debug: vnet3: up : running module address
debug: vnet3: up : running module addressvirtual
debug: vnet3: up : running module usercmds
debug: vnet3: post-up : running module usercmds
debug: vnet3: statemanager sync state pre-up
debug: vrfbr_z03: found dependents ['vrfvx_z03']
info: vrfvx_z03: running ops ...
debug: vrfvx_z03: pre-up : running module xfrm
debug: vrfvx_z03: pre-up : running module link
debug: vrfvx_z03: pre-up : running module bond
debug: vrfvx_z03: pre-up : running module vlan
debug: vrfvx_z03: pre-up : running module vxlan
info: vrfvx_z03: vxlan already exists - no change detected
debug: vrfvx_z03: pre-up : running module usercmds
debug: vrfvx_z03: pre-up : running module bridge
info: vrfbr_z03: applying bridge port configuration: ['vrfvx_z03']
debug: vrfvx_z03: pre-up : running module bridgevlan
debug: vrfvx_z03: pre-up : running module tunnel
debug: vrfvx_z03: pre-up : running module vrf
debug: vrfvx_z03: pre-up : running module ethtool
debug: vrfvx_z03: pre-up : running module auto
debug: vrfvx_z03: pre-up : running module address
info: executing /sbin/sysctl net.mpls.conf.vrfvx_z03.input=0
debug: vrfvx_z03: up : running module dhcp
debug: vrfvx_z03: up : running module address
debug: vrfvx_z03: up : running module addressvirtual
debug: vrfvx_z03: up : running module usercmds
debug: vrfvx_z03: post-up : running module usercmds
debug: vrfvx_z03: statemanager sync state pre-up
info: vrfbr_z03: running ops ...
debug: vrfbr_z03: pre-up : running module xfrm
debug: vrfbr_z03: pre-up : running module link
debug: vrfbr_z03: pre-up : running module bond
debug: vrfbr_z03: pre-up : running module vlan
debug: vrfbr_z03: pre-up : running module vxlan
debug: vrfbr_z03: pre-up : running module usercmds
debug: vrfbr_z03: pre-up : running module bridge
info: vrfbr_z03: bridge already exists
info: vrfbr_z03: applying bridge settings
info: vrfbr_z03: reset bridge-hashel to default: 4
info: reading '/sys/class/net/vrfbr_z03/bridge/stp_state'
info: vrfbr_z03: netlink: ip link set dev vrfbr_z03 type bridge (with attributes)
debug: attributes: {26: 4}
debug: vrfbr_z03: evaluating port expr '['vrfvx_z03']'
info: vrfbr_z03: port vrfvx_z03: already processed
info: vrfbr_z03: applying bridge configuration specific to ports
info: vrfbr_z03: processing bridge config for port vrfvx_z03
debug: vrfbr_z03: evaluating port expr '['vrfvx_z03']'
debug: vrfbr_z03: _get_bridge_mac returned (None, None)
debug: vrfbr_z03: pre-up : running module bridgevlan
debug: vrfbr_z03: pre-up : running module tunnel
debug: vrfbr_z03: pre-up : running module vrf
debug: vrfbr_z03: pre-up : running module ethtool
debug: vrfbr_z03: pre-up : running module auto
debug: vrfbr_z03: pre-up : running module address
info: executing /sbin/sysctl net.mpls.conf.vrfbr_z03.input=0
info: vrfbr_z03: bridge inherits mtu from its ports. There is no need to assign mtu on a bridge
debug: vrfbr_z03: up : running module dhcp
debug: vrfbr_z03: up : running module address
debug: vrfbr_z03: up : running module addressvirtual
debug: vrfbr_z03: up : running module usercmds
debug: vrfbr_z03: post-up : running module usercmds
debug: vrfbr_z03: statemanager sync state pre-up
info: vrf_z03: running ops ...
debug: vrf_z03: pre-up : running module xfrm
debug: vrf_z03: pre-up : running module link
debug: vrf_z03: pre-up : running module bond
debug: vrf_z03: pre-up : running module vlan
debug: vrf_z03: pre-up : running module vxlan
debug: vrf_z03: pre-up : running module usercmds
debug: vrf_z03: pre-up : running module bridge
debug: vrf_z03: pre-up : running module bridgevlan
debug: vrf_z03: pre-up : running module tunnel
debug: vrf_z03: pre-up : running module vrf
debug: vrf_z03: pre-up : running module ethtool
debug: vrf_z03: pre-up : running module auto
debug: vrf_z03: pre-up : running module address
info: executing /sbin/sysctl net.mpls.conf.vrf_z03.input=0
debug: vrf_z03: up : running module dhcp
debug: vrf_z03: up : running module address
debug: vrf_z03: up : running module addressvirtual
debug: vrf_z03: up : running module usercmds
debug: vrf_z03: post-up : running module usercmds
info: executing ip route add vrf vrf_z03 unreachable default metric 4278198272
debug: vrf_z03: statemanager sync state pre-up

I think the problem comes from "chrony" and "postfix"

By the way, I did some pressure testing with 500 zones and 1000 zones, here is my observation:
1. with addon_scripts_support=0, it takes 3-4 minutes to finish pvesh set /cluster/sdn for 500 zones.
2. with addon_scripts_support=0, pvesh set /cluster/sdn returns the following error for 1000 zones:
- reloadnetworkall: TASK ERROR: got unexpected control message:
- SRV networking - Reload: stopped: unable to read tail (got 0 bytes)
3. With 500 zones, ping latency from VM to its subnet gateway is on aveage 1.5~1.8 ms while it is 0.4ms with only 2 zones.
 
Last edited:
Hi,
Thanks.
I'll look about postfix && chrony scripts, if we could bypass execution.

By the way, I did some pressure testing with 500 zones and 1000 zones, here is my observation:
1. with addon_scripts_support=0, it takes 3-4 minutes to finish pvesh set /cluster/sdn for 500 zones.
2. with addon_scripts_support=0, pvesh set /cluster/sdn returns the following error for 1000 zones:
- reloadnetworkall: TASK ERROR: got unexpected control message:
- SRV networking - Reload: stopped: unable to read tail (got 0 bytes)

About number of zone, with evpn, each zone is a different vrf, I'm really not sure about the scalability of the kernel.
I have done test with a lots of vnets but not too much zones.

I'll try to do tests next week.

Do you really need 500 zones in production ?


3. With 500 zones, ping latency from VM to its subnet gateway is on aveage 1.5~1.8 ms while it is 0.4ms with only 2 zones.

are your sure abou the 0.4ms result ???? that seem quite huge
In production (around 300vnets with 2 evpn zones) ,I'm around 0.005ms average from vm to the gateway (which is local)

Code:
# ping -f 10.0.0.1
PING 10.0.0.1 (10.0.0.2) 56(84) bytes of data.
^C
--- 10.0.0.1 ping statistics ---
77125 packets transmitted, 77125 received, 0% packet loss, time 2317ms
rtt min/avg/max/mdev = 0.004/0.005/0.285/0.003 ms, ipg/ewma 0.030/0.005 ms

I mean 0.4ms, is the latency to do 50km....
 
Last edited:
I'm testing a multi-tenant configuration: multiple PVE clusters provide VM resources to multiple tenants.
I assume each tenant has a separate zone. So if there are 500 tenants, there will be 500 zones.
If some tenants have only 1-2 VMs, it's possible to have lots of zones inside one cluster.
From our testing, Linux kernel might not be efficient enough to handle 500 vrfs , so I'm thinking about sharing one zone among small tenants (number of VM < 50, firewall is used to insulate network), and using dedicated zone for large tenants (number of VM > 50, vrf is used to insulate network).

As for the ping latency, we are testing in a nested VM enviroment. All PVE hosts are running on a ESXi host, so it might be slow. We'll switch to bare-metal afterwards.
 
I'm testing a multi-tenant configuration: multiple PVE clusters provide VM resources to multiple tenants.
I assume each tenant has a separate zone. So if there are 500 tenants, there will be 500 zones.
If some tenants have only 1-2 VMs, it's possible to have lots of zones inside one cluster.
From our testing, Linux kernel might not be efficient enough to handle 500 vrfs , so I'm thinking about sharing one zone among small tenants (number of VM < 50, firewall is used to insulate network), and using dedicated zone for large tenants (number of VM > 50, vrf is used to insulate network).
mmm, I could look to add an option to avoid routing between differents vnets/vxlan in the same vrf.
It should be more performant than 1 vrf just for 1 or 2 vms.


As for the ping latency, we are testing in a nested VM enviroment. All PVE hosts are running on a ESXi host, so it might be slow. We'll switch to bare-metal afterwards.
ah ok !
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!