[SOLVED] SDN broken after underlying network change

lifeboy · Sep 14, 2023

We ran into a very nasty issue a few days ago.

Background:
Systemd generates ridiculously long interface names (see https://manpages.debian.org/bookworm/udev/systemd.link.5.en.html and referenced here https://wiki.debian.org/NetworkInterfaceNames#CUSTOM_SCHEMES_USING_.LINK_FILES) like enp25s0f1np1, which combined with a VLAN creates enp25s0f1np1.100 that is more than 15 characters in length which generates complaints.
So we opted to rename the interfaces to what they were before (found quite a few references for this) and in our eagerness missed that they should not be called eth0, eth1, etc.
So when I initially encountered this, the issue I raised was not clear enough about this, so it was never resolved. We simply checked the renamed interfaces' mac addresses and used the correct one for each service (corosync, ceph, applications) regardless of the "wrong" name.
We also don't restart nodes willy-nilly, so everything was stable.

The problem:
Recently we had to restart a node, so we migrated all VM's and containers away from that node and restarted it.
It didn't come back up.
While investigating, one of the other nodes crashed (not sure exactly why, but it seems there were too little RAM left, probably because ceph needed a lot more RAM to deal with the node that down.
(note to self: In future, tell the cluster to not rebalance before taking down a node).
This put the remaining 2 nodes under even more stress and eventually only one node was still functional. All VM's and LXC's were down because ceph was not in quorum.

So after inspecting the nodes via the BMC consoles, it was clear that they were actually booting up, but networks were not functioning. Ceph was not getting quorum, the nodes couldn't even ping each other, etc.
Needless to say, after many hours, we finally found the problem with the renamed nic's and changed the names to lan1, lan2, etc.
This allowed the cluster to start again.

However, because the network device names changed, we had updated the /etc/network/interfaces to the correct names. For example, here is nodeA:

Code:

auto lo
iface lo inet loopback

auto lan0
iface lan0 inet manual
#internet - 1Gb/s max 10Gb/s

auto lan1
iface lan1 inet static
    address 172.16.10.1/24
#corosync - 1GB/s max 10Gb/s

auto lan1:1
iface lan1:1 inet static
    address 172.16.5.201/24
#ILOM path

auto lan3
iface lan3 inet static
    address 10.10.10.1/24
#ceph - 25Gb/s

auto lan2
iface lan2 inet manual
#LAN - 25Gb/s

auto lan2.25
iface lan2.25 inet manual
#client 1

auto lan2.35
iface lan2.35 inet manual
#client 2

auto vmbr0
iface vmbr0 inet static
    address 192.168.131.1/24
    gateway 192.168.131.254
    bridge-ports lan2
    bridge-stp off
    bridge-fd 0

auto vmbr1
iface vmbr1 inet manual
    bridge-ports lan0
    bridge-stp off
    bridge-fd 0

auto vmbr2
iface vmbr2 inet manual
    bridge-ports lan2.25
    bridge-stp off
    bridge-fd 0
# client 1 VLAN

auto vmbr4
iface vmbr4 inet static
    address 192.168.151.1/24
    bridge-ports lan2.35
    bridge-stp off
    bridge-fd 0
# client 2 VLAN

source /etc/network/interfaces.d/*

We then needed to touch the network config of each VM or LXC to allow traffic to/from it, which I assume is because the config is stored somehow and not dynamic.

Everything is working fine now, except the SDN's. The SDN's cannot communicate over the actual network ports / cable, but only on the same node between machines and the virtual pfSense firewall were using.

So, for example, I have VLAN12, which is one I use for testing. The ip range for that is 192.168.161.0/24 with .253 the gateway on pfSense1A and .252 the gateway on pfSense1B with a virtual address .254 managed by CARP between the two.

The traffic is allowed by the switch (actually the switch was untouched in this whole debacle), but while all traffic is allowed between the two firewalls, when it comes to the VLAN's defined by SDN, they are not communicating.

For clarity: I have created an interface on each firewall for each VLAN.

So in pfSense I named each of these appropriately, assigned addresses and created rules that are appropriate for each VLAN. None of this has changed, and it was working perfectly before the restart and resulting renaming of underlying nic's.

In the above list, net0, net1, net2, net4 are all working as expected. Note: net2 and net4 are bridges with manual VLAN's defined in interfaces.conf, a config we used before SDN's became available.

Net3, 5, 6, 7 and 8 are not relaying any traffic.

As a test, I deleted one of the VLAN's from the SDN and pfSense, recreated it again, with no difference in behaviour. I also touched each of these interfaces of the pfSense VM's, like I did with the VM's and LXC's that run applications, but it didn't fix it.

Question:
What do I need to do the make SDN's work again. Do I have to remove it all and recreated it with different names? What could be the underlying cause for this behaviour?

lifeboy · Sep 18, 2023

Surely someone much have some idea of what is wrong?

spirit · Sep 18, 2023

do you have tried to regenerate the sdn config ? (use the apply button on the sdn main panel).

if it's a vlan type zone, you should have specify a vmbrX bridge as source, so the sdn config should see the lanX interface and regenerate config correctly.

(you can see generated config in /etc/network/interfaces.d/sdn )

lifeboy · Sep 18, 2023

spirit said:
do you have tried to regenerate the sdn config ? (use the apply button on the sdn main panel).

Yes, before I did that, the SDN's showed an error. After applying the status returned to normal.

spirit said:
if it's a vlan type zone, you should have specify a vmbrX bridge as source, so the sdn config should see the lanX interface and regenerate config correctly.

(you can see generated config in /etc/network/interfaces.d/sdn )

Yes, I have actually deleted an complete definition and recreated it, but it has not no noticeable effect. I think I'll just do one again.

spirit · Sep 19, 2023

Maybe try to restart the node ?

if the SDN don't show any error in gui, that mean that config is correctly applied and running.

can you send content of /etc/network/interfaces.d/sdn ?

lifeboy · Sep 19, 2023

lifeboy said:
I think I'll just do one again.

I completely removed VLAN12. I first removed the interfaces from the firewall machines, then I removed the SDN config. I then restarted the firewall, recreated the SDN for VLAN12, then re-added the interface (which is a bridge on vmbr0), recreated the rules on the pfSense after adding the ip addresses for the re-added interface. The results are the same. As long as the VM is on the same node as the firewall that is active, the network work from the VM, but when I migrate it to another node, the network stops responding.

Code:

# cat /etc/network/interfaces.d/sdn
#version:60


auto VLAN10
iface VLAN10
    bridge_ports ln_VLAN10
    bridge_stp off
    bridge_fd 0
    alias VLAN 10 for NSFAS


auto VLAN11
iface VLAN11
    bridge_ports ln_VLAN11
    bridge_stp off
    bridge_fd 0
    alias VLAN 11 for Productive Eng


auto VLAN12
iface VLAN12
    bridge_ports ln_VLAN12
    bridge_stp off
    bridge_fd 0
    alias GTS_Abellard


auto VLAN13
iface VLAN13
    bridge_ports ln_VLAN13
    bridge_stp off
    bridge_fd 0
    alias VLAN for VO


auto VLAN14
iface VLAN14
    bridge_ports ln_VLAN14
    bridge_stp off
    bridge_fd 0
    alias VLAN for Zenware


auto ln_VLAN10
iface ln_VLAN10
    link-type veth
    veth-peer-name pr_VLAN10


auto ln_VLAN11
iface ln_VLAN11
    link-type veth
    veth-peer-name pr_VLAN11


auto ln_VLAN12
iface ln_VLAN12
    link-type veth
    veth-peer-name pr_VLAN12


auto ln_VLAN13
iface ln_VLAN13
    link-type veth
    veth-peer-name pr_VLAN13


auto ln_VLAN14
iface ln_VLAN14
    link-type veth
    veth-peer-name pr_VLAN14


auto pr_VLAN10
iface pr_VLAN10
    link-type veth
    veth-peer-name ln_VLAN10


auto pr_VLAN11
iface pr_VLAN11
    link-type veth
    veth-peer-name ln_VLAN11


auto pr_VLAN12
iface pr_VLAN12
    link-type veth
    veth-peer-name ln_VLAN12


auto pr_VLAN13
iface pr_VLAN13
    link-type veth
    veth-peer-name ln_VLAN13


auto pr_VLAN14
iface pr_VLAN14
    link-type veth
    veth-peer-name ln_VLAN14


auto vmbr0v10
iface vmbr0v10
    bridge_ports  pr_VLAN10
    bridge_stp off
    bridge_fd 0


auto vmbr0v11
iface vmbr0v11
    bridge_ports  pr_VLAN11
    bridge_stp off
    bridge_fd 0


auto vmbr0v12
iface vmbr0v12
    bridge_ports  pr_VLAN12
    bridge_stp off
    bridge_fd 0


auto vmbr0v13
iface vmbr0v13
    bridge_ports  pr_VLAN13
    bridge_stp off
    bridge_fd 0


auto vmbr0v14
iface vmbr0v14
    bridge_ports  pr_VLAN14
    bridge_stp off
    bridge_fd 0

lifeboy · Sep 19, 2023

spirit said:
Maybe try to restart the node ?

I will restart a node after hours and see if that makes any difference.

spirit · Sep 19, 2023

Also, maybe can you try to use ethX instead lanX, maybe they are some parsers not working correctly with custom names for interfaces.

lifeboy · Sep 19, 2023

spirit said:
Also, maybe can you try to use ethX instead lanX, maybe they are some parsers not working correctly with custom names for interfaces.

I wrote in my OP:

Background:
Systemd generates ridiculously long interface names (see https://manpages.debian.org/bookworm/udev/systemd.link.5.en.html and referenced here https://wiki.debian.org/NetworkInterfaceNames#CUSTOM_SCHEMES_USING_.LINK_FILES) like enp25s0f1np1, which combined with a VLAN creates enp25s0f1np1.100 that is more than 15 characters in length which generates complaints.
So we opted to rename the interfaces to what they were before (found quite a few references for this) and in our eagerness missed that they should not be called eth0, eth1, etc.

Therefor I can't use ethX, the system has the ability to use those names automatically, so it messed the nics up by swapping them randomly.

spirit · Sep 19, 2023

lifeboy said:
I wrote in my OP:

Therefor I can't use ethX, the system has the ability to use those names automatically, so it messed the nics up by swapping them randomly.

arf, damned systemd.

maybe simply:

/etc/default/grub
GRUB_CMDLINE_LINUX="net.ifnames=0"

to revert to old native kernel ethx without need to use link

lifeboy · Sep 19, 2023

I see these in the syslog of the nodes.

On the nodeA when the standby firewall is starting:

Code:

Sep 19 17:14:00 FT1-NodeA systemd-udevd[3462078]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 19 17:14:01 FT1-NodeA kernel: [564541.374585] device tap101i6 entered promiscuous mode
Sep 19 17:14:01 FT1-NodeA kernel: [564541.400562] VLAN12: port 2(tap101i6) entered blocking state
Sep 19 17:14:01 FT1-NodeA kernel: [564541.401035] VLAN12: port 2(tap101i6) entered disabled state
Sep 19 17:14:01 FT1-NodeA kernel: [564541.402252] VLAN12: port 2(tap101i6) entered blocking state
Sep 19 17:14:01 FT1-NodeA kernel: [564541.402676] VLAN12: port 2(tap101i6) entered forwarding state
Sep 19 17:14:01 FT1-NodeA pve-ha-lrm[3461930]: Task 'UPID:FT1-NodeA:0034D32D:035D6AEE:6509BAB4:qmstart:101:root@pam:' still active, waiting
Sep 19 17:14:01 FT1-NodeA systemd-udevd[3462078]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

On the NodeB where the Guest VM116 has been migrated to and becomes unreachable.

Code:

Sep 19 17:22:54 FT1-NodeB qm[1427774]: start VM 116: UPID:FT1-NodeB:0015C93E:0359D841:6509BCCE:qmstart:116:root@pam:
Sep 19 17:22:54 FT1-NodeB qm[1427744]: <root@pam> starting task UPID:FT1-NodeB:0015C93E:0359D841:6509BCCE:qmstart:116:root@pam:
Sep 19 17:22:55 FT1-NodeB kernel: [562194.082584]  rbd6: p1 p2
Sep 19 17:22:55 FT1-NodeB kernel: [562194.082836] rbd: rbd6: capacity 21474836480 features 0x3d
Sep 19 17:22:55 FT1-NodeB systemd[1]: Started 116.scope.
Sep 19 17:22:55 FT1-NodeB systemd-udevd[1427924]: Using default interface naming scheme 'v247'.
Sep 19 17:22:55 FT1-NodeB systemd-udevd[1427924]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 19 17:22:56 FT1-NodeB kernel: [562194.935829] device tap116i0 entered promiscuous mode
Sep 19 17:22:56 FT1-NodeB kernel: [562194.963485] VLAN12: port 2(tap116i0) entered blocking state
Sep 19 17:22:56 FT1-NodeB kernel: [562194.963491] VLAN12: port 2(tap116i0) entered disabled state
Sep 19 17:22:56 FT1-NodeB kernel: [562194.963615] VLAN12: port 2(tap116i0) entered blocking state
Sep 19 17:22:56 FT1-NodeB kernel: [562194.963617] VLAN12: port 2(tap116i0) entered forwarding state
Sep 19 17:22:56 FT1-NodeB qm[1427744]: <root@pam> end task UPID:FT1-NodeB:0015C93E:0359D841:6509BCCE:qmstart:116:root@pam: WARNINGS: 1

lifeboy · Sep 19, 2023

More...

VLAN12 is an SDN and vmbr2 is a manually configured bridge to vlan 25.

Code:

root@FT1-NodeA:~# ethtool VLAN12
Settings for VLAN12:
    Supported ports: [  ]
    Supported link modes:   Not reported
    Supported pause frame use: No
    Supports auto-negotiation: No
    Supported FEC modes: Not reported
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Advertised FEC modes: Not reported
    Speed: 10000Mb/s
    Duplex: Unknown! (255)
    Auto-negotiation: off
    Port: Other
    PHYAD: 0
    Transceiver: internal
    Link detected: yes
root@FT1-NodeA:~# ethtool vmbr2
Settings for vmbr2:
    Supported ports: [  ]
    Supported link modes:   Not reported
    Supported pause frame use: No
    Supports auto-negotiation: No
    Supported FEC modes: Not reported
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Advertised FEC modes: Not reported
    Speed: 25000Mb/s
    Duplex: Unknown! (255)
    Auto-negotiation: off
    Port: Other
    PHYAD: 0
    Transceiver: internal
    Link detected: yes

From the above the only difference is that VLAN12 is reported as 10GB/s, whereas vmbr2 is 25Gb/s. That indicates that we may have attached the VLAN12 to and incorrect ethernet nic.

lan2 is master for vmbr0
there is no 10Gb/s interface, only 1Gb/s and 25Gb/s

lifeboy · Sep 19, 2023

spirit said:
arf, damned systemd.

maybe simply:

/etc/default/grub
GRUB_CMDLINE_LINUX="net.ifnames=0"

to revert to old native kernel ethx without need to use link

I'll have to schedule to test this carefully, since I cannot risk taking down a node (like I did before) and then it doesn't come back up because the ports have different names than I expected.

I'd like to report this as a bug, but I'm not sure what is the cause of this. Is it the systemd network code? The port renames work fine for network config file setups.

Where in the SDN config are the ports linked to the SDN? Is it just in /etc/pve/sdn/zones.cfg ? There I've got:

Code:

vlan: GTS
    bridge vmbr0
    ipam pve

But I don't see where vmbr0 is linked to a vlan. Is that in the pmx SDN code or systemd?

spirit · Sep 26, 2023

lifeboy said:
I'll have to schedule to test this carefully, since I cannot risk taking down a node (like I did before) and then it doesn't come back up because the ports have different names than I expected.

I'd like to report this as a bug, but I'm not sure what is the cause of this. Is it the systemd network code? The port renames work fine for network config file setups.

Where in the SDN config are the ports linked to the SDN? Is it just in /etc/pve/sdn/zones.cfg ? There I've got:

Code:

vlan: GTS bridge vmbr0 ipam pve

But I don't see where vmbr0 is linked to a vlan. Is that in the pmx SDN code or systemd?

It's in sdn code, we lookup for ifaces in the bridge with a spefic regex (eth*, en*,bond*).
because we need to exclude other virtual interfaces,etc...
we really can't get it work with custom names

Code:

/usr/share/perl5/PVE/Network/SDN/Zones/Plugin.pm

sub get_bridge_ifaces {
    my ($bridge) = @_;

    my @bridge_ifaces = ();
    my $dir = "/sys/class/net/$bridge/brif";
    PVE::Tools::dir_glob_foreach($dir, '(((eth|bond)\d+|en[^.]+)(\.\d+)?)', sub {
        push @bridge_ifaces, $_[0];
    });

    return @bridge_ifaces;
}

lifeboy · Sep 26, 2023

spirit said:
It's in sdn code, we lookup for ifaces in the bridge with a spefic regex (eth*, en*,bond*).
because we need to exclude other virtual interfaces,etc...
we really can't get it work with custom names

Code:

/usr/share/perl5/PVE/Network/SDN/Zones/Plugin.pm sub get_bridge_ifaces { my ($bridge) = @_; my @bridge_ifaces = (); my $dir = "/sys/class/net/$bridge/brif"; PVE::Tools::dir_glob_foreach($dir, '(((eth|bond)\d+|en[^.]+)(\.\d+)?)', sub { push @bridge_ifaces, $_[0]; }); return @bridge_ifaces; }

Thanks for checking!

Could we do one of two things then please?
1. Create a note in the SDN documentation explaining the limits of nic names used for SDN bridges
2. Allow more options, like lan*, nic*, net* and possibly others
Or both?

For now, I'm going to name the nics enlan0, enlan1, enlan2, etc. That should then allow the SDN code to pick it up, right?
I'm hesitant to revert to pre-systemd ethx naming, since I had a swapping of ports occur before. Nodes are specification wise identical, but one one swaps the ports to what is eth1 on others is eth0 on that one.

lifeboy · Sep 27, 2023

lifeboy said:
For now, I'm going to name the nics enlan0, enlan1, enlan2, etc. That should then allow the SDN code to pick it up, right?

I changed two nodes with this setting last night and named the ports enlan0, enlan1, enlan2 and enlan3. However, after this change the guest on one of the nodes still cannot ping a guest on the other node via a SDN VLAN. I did change the /etc/network/interfaces file and rebooted each node.

Code:

# ls -la /sys/class/net/vmbr0/brif/
total 0
drwxr-xr-x 2 root root 0 Sep 26 19:09 .
drwxr-xr-x 7 root root 0 Sep 26 19:08 ..
lrwxrwxrwx 1 root root 0 Sep 27 15:53 enlan2 -> ../../../../pci0000:17/0000:17:02.0/0000:19:00.0/net/enlan2/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 fwpr103p0 -> ../../fwpr103p0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 fwpr124p0 -> ../../fwpr124p0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 fwpr128p0 -> ../../fwpr128p0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 fwpr139p0 -> ../../fwpr139p0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 fwpr152p0 -> ../../fwpr152p0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 fwpr167p0 -> ../../fwpr167p0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 fwpr191p0 -> ../../fwpr191p0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 fwpr197p0 -> ../../fwpr197p0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 tap101i0 -> ../../tap101i0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 tap163i0 -> ../../tap163i0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 veth104i0 -> ../../veth104i0/brport
lrwxrwxrwx 1 root root 0 Sep 27 15:53 veth132i0 -> ../../veth132i0/brport

So, enlan2 in in the list and I tested the regex to make sure it will select enlan2, which it does.

Did I miss something? Is there possibly another issue somewhere?

lifeboy · Oct 4, 2023

Just a bump... @spirit: The SDN is still not working despite the change to a name that is recognized by the code...

spirit · Oct 4, 2023

lifeboy said:
Just a bump... @spirit: The SDN is still not working despite the change to a name that is recognized by the code...

can you send content of /etc/network/interfaces && /etc/network/interfaces.d/sdn ?

lifeboy · Oct 4, 2023

spirit said:
can you send content of /etc/network/interfaces && /etc/network/interfaces.d/sdn ?

I had to rename them with .txt extension to be able to attach them here. These are NodeB, NodeC to follow...

lifeboy · Oct 4, 2023

lifeboy said:
I had to rename them with .txt extension to be able to attach them here. These are NodeB, NodeC to follow...

NodeC:

[SOLVED] SDN broken after underlying network change

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Attachments

Renowned Member

Attachments

We value your privacy