Problem - SDN

pvpaulo

Member
Jun 15, 2022
42
1
13
Problem description:


Hi everyone,


I have a cluster with 3 nodes (PVE01, PVE02, PVE03) running Proxmox VE 8.x and I’m using SDN with VLANs. I noticed an inconsistent behavior across the nodes:


  • On PVE02 and PVE03, the files in /etc/network/interfaces.d/sdn are generated correctly, including all VLANs (135, 35, 45, 7) with their respective ln_* and pr_* interfaces.
  • On PVE01, only VLAN135 appears correctly. The other VLANs (7, 35, 45) are incomplete — the ln_* and pr_* interfaces are missing.






Example on PVE02/PVE03 (correct):
#version:32

auto VLAN135
iface VLAN135
bridge_ports ln_VLAN135
bridge_stp off
bridge_fd 0
alias FE135

auto VLAN35
iface VLAN35
bridge_ports ln_VLAN35
bridge_stp off
bridge_fd 0
alias BE

auto VLAN45
iface VLAN45
bridge_ports ln_VLAN45
bridge_stp off
bridge_fd 0
alias BE-WT

auto VLAN7
iface VLAN7
bridge_ports ln_VLAN7
bridge_stp off
bridge_fd 0
alias GER

auto ln_VLAN135
iface ln_VLAN135
link-type veth
veth-peer-name pr_VLAN135

auto ln_VLAN35
iface ln_VLAN35
link-type veth
veth-peer-name pr_VLAN35

auto ln_VLAN45
iface ln_VLAN45
link-type veth
veth-peer-name pr_VLAN45

auto ln_VLAN7
iface ln_VLAN7
link-type veth
veth-peer-name pr_VLAN7

auto pr_VLAN135
iface pr_VLAN135
link-type veth
veth-peer-name ln_VLAN135

auto pr_VLAN35
iface pr_VLAN35
link-type veth
veth-peer-name ln_VLAN35

auto pr_VLAN45
iface pr_VLAN45
link-type veth
veth-peer-name ln_VLAN45

auto pr_VLAN7
iface pr_VLAN7
link-type veth
veth-peer-name ln_VLAN7

auto vmbr0v35
iface vmbr0v35
bridge_ports bond0.35 pr_VLAN35
bridge_stp off
bridge_fd 0

auto vmbr0v45
iface vmbr0v45
bridge_ports bond0.45 pr_VLAN45
bridge_stp off
bridge_fd 0

auto vmbr0v7
iface vmbr0v7
bridge_ports bond0.7 pr_VLAN7
bridge_stp off
bridge_fd 0

auto vmbr1v135
iface vmbr1v135
bridge_ports eno8403.135 pr_VLAN135
bridge_stp of


On PVE01, only VLAN135 is generated properly, while VLAN7/35/45 are missing the ln_* and pr_* sections.






What I have checked so far:


  • Files in /etc/pve/sdn/ are identical across all 3 nodes.
  • Packages and versions (pveversion -v, dpkg -l | grep pve) are identical.
  • The libpve-network-perl package (required for SDN) is installed and the same version on all nodes.
  • Cluster is healthy and in quorum (pvecm status).
  • Restarting the service (systemctl restart pve-sdn) or forcing (pvesdn update vnet) does not help.
  • Logs (journalctl -u pve-sdn -f) show no relevant errors.
  • Manual test: if I manually create the VLAN interface (e.g. ip link add link vmbr0 name vmbr0.7 type vlan id 7), it works perfectly and VMs can communicate. So physical networking and VLAN tagging are fine — the issue is only that SDN does not generate the proper interface definitions on PVE01.



Question:
Has anyone experienced this situation where only one node in the cluster does not apply the SDN configuration completely?
Is there a way to force SDN to rebuild the entire /etc/network/interfaces.d/sdn tree based on /etc/pve/sdn/?
Could this be an SDN bug or some local cache/config corruption?




Environment:


  • Proxmox VE 8.x (all nodes on the same version)
  • SDN enabled with VLANs (135, 35, 45, 7)
  • Physical network: bond0 and bridges (vmbr0, vmbr1)
  • Package libpve-network-perl installed and identical on all 3 nodes



Any help or hints on how to fix PVE01 not replicating VLANs correctly in SDN would be greatly appreciated.


Thanks!
 
The generated configuration depends on whether the bridge is VLAN-aware enabled, so on the hosts where you have pr_ / ln_ interfaces the bridge is not VLAN-aware. It should work in both cases though, so if you have any issues they are most likely linked to something different.

Oftentimes this is caused by additional configuration in the /etc/network/interfaces file, so if you could post that as well as the SDN configuration from the node where you are having issues I can take a look at it.

Is there a way to force SDN to rebuild the entire /etc/network/interfaces.d/sdn tree based on /etc/pve/sdn/?
just re-apply the configuration
 
Hello @pvpaulo,

@shanreich is on the right track. The different configurations are almost certainly caused by the vlan-aware flag on your vmbr0 on node PVE01.

If that bridge is set as VLAN-aware in /etc/network/interfaces, SDN correctly omits the ln_* and pr_* veth pairs because they are not needed in that mode. Please compare this file between PVE01 and one of the other nodes.