SDN/vxlan oddness

bhomrich

New Member
Jun 29, 2024
4
1
3
Hi everyone,

I've been struggling to get a 3-node cluster ( on Proxmox 8.4.1 now) to use an SDN vxlan based configuration to allow a single subnet shared by VMs across all 3 hypervisor hosts routing to a different subnet (my lab) then to a 3rd for edge devices (home/kids/etc). I've gotten it working twice during multiple attempts previously, but not had 100% certainty on what bridge configuration was the key.

As I was trying again yesterday, I have something working at the moment, but didn't realize it until *AFTER* I tore down the vxlan zone and the vnet thru the UI.

Here's the weirdness: The SDN apply button returns errors on 2 of the 3 hypervisors that ifreload returns exit 1. So my config is working, but it seems to be relying on something hidden. and I'm guessing that the SDN apply to teardown isn't completely clean, so what I have is stale but working.... AUGH!

I did some digging on other threads, and others have recommended sharing the debug output of ifreload (-a -d). It's attached from 1 of the 2 hosts failing.

I say that my config is relying on something hidden because the debug output references bridge and vlan configuration from at least one prior attempt, possibly 2.

From the tail of the file:
debug: vmbr0: evaluating port expr '['enp3s0f0']'
debug: vxlan100: evaluating port expr '['vxlan_vxlan100']'
debug: vmbr4: evaluating port expr '['vxlan_vmbr4']'
error: main exception: [Errno 2] No such file or directory: '/sys/class/net/vxlan100/brif/'
References to both vxlan_vxlan100 and vxlan_vmbr4 are old attempts to setup the SDN (alias) for the VMs to use in network config (when selecting the bridge for network interfaces)

And the really weird: I look at the /etc/network/interfaces and /etc/network/interfaces.d/sdn, and there are NO references to the vxlan entities

So my questions:
  1. Where are these references to the stale information hiding? I am comfortable with brctl/bridge/link command line interfaces, and don't see references to these stale entities.
  2. If you wanted to SAVE the current hidden config, what might you do? I've got the UI, plus pvesh interface and a postman collection retrieving REST calls with an API key, so I can go and recover whatever I need with whatever tool would be best suited to recover the most detail.
  3. Am I working for reasons that have nothing to do with the vxlan?
    I have restarted at least 1 of the 3 hypervisors (including the one that didn't complain), and my configuration has stayed intact as far as I can tell
    • VMs on each hypervisor can ping each other, and
    • I can ping them from my edge laptop, including using the dns names from the powerdns running in the VM subnet.
    • am I doing a configuraton that works thru another mechanism due to the bridge config on each VM tap to my "management" layer interface (vmbr0, which is actually on the lab subnet)

      I see this or something like it everywhere I have running VMs:
    • 19: tap119i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN mode DEFAULT group default qlen 1000
    • VM 119 interface zero is setup with a static IP on the VM subnet, but bridged to the lab subnet (vmbr0), and is working as intended. I will admit while I'm in IT, networking has never been my strong suit...
    • I thought the VXLan was there so that the hypervisors would transparently share traffic to the other hypervisors for machines not local to themselves..... Am I getting that some other way?
The machines are 2 DL360's and a hand-built quad-core box mainly for migration/storage. Not sure their hardware is playing a role here, but happy to provide more info if it matters.

Thanks for any recommendations,suggestions....

BrianH
 

Attachments

Last edited: