sdn vxlan zone not propagating arp/any traffic

landryb

New Member
Jul 1, 2023
14
1
3
Hi,

on a up-to-date setup with two nodes, the only difference being the kernel version (because of https://forum.proxmox.com/threads/3...ly-slow-after-kernel-5-13.129909/#post-570343 that i havent found time to bisect)

Code:
pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-3-pve)
pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 5.13.19-6-pve)
i doubt that can be relevant to the problem i'm seeing, since the vxlan driver shouldnt have changed much between those versions.

i'm trying to build a vxlan zone between them to 'join' linux containers on the same subnet on both sides. i dont use openvswitch at all, the rest of the setup is plain linux bridges for VMs/CTs.

direct link between nodes is on the 10.0.254.0/30 subnet (ie node openbsd-amd64 has 10.0.254.1, node pve-openbsd has 10.0.254.2) - that's also the link used for cluster trafic, and not the public IP interface.

Code:
root@pve-openbsd:~# ip a sh dev eth1 |grep brd
    link/ether 00:30:48:cd:be:11 brd ff:ff:ff:ff:ff:ff
    inet 10.0.254.2/30 brd 10.0.254.3 scope global eth1
root@openbsd-amd64:~# ip a sh dev eth1 |grep brd
    link/ether 00:30:48:cd:c2:85 brd ff:ff:ff:ff:ff:ff
    inet 10.0.254.1/30 brd 10.0.254.3 scope global eth1
building the vxlan zone from the web iface generates this config:
Code:
root@pve-openbsd:~# cat /etc/network/interfaces.d/sdn 
#version:16

auto vxlan_vxnet5
iface vxlan_vxnet5
        vxlan-id 5
        vxlan_remoteip 10.0.254.1
        mtu 1450

auto vxnet5
iface vxnet5
        bridge_ports vxlan_vxnet5
        bridge_stp off
        bridge_fd 0
        mtu 1450
root@openbsd-amd64:~# cat /etc/network/interfaces.d/sdn 
#version:16

auto vxlan_vxnet5
iface vxlan_vxnet5
        vxlan-id 5
        vxlan_remoteip 10.0.254.2
        mtu 1450

auto vxnet5
iface vxnet5
        bridge_ports vxlan_vxnet5
        bridge_stp off
        bridge_fd 0
        mtu 1450

containers 106 & 107 are on one node, containers 105 & 108 are on the other node, and all are bridged on vxnet5
Code:
root@pve-openbsd:~# grep net0 /etc/pve/nodes/*/lxc/*
/etc/pve/nodes/openbsd-amd64/lxc/106.conf:net0: name=eth0,bridge=vxnet5,hwaddr=DE:48:1D:B3:25:DA,ip=10.1.1.2/24,type=veth
/etc/pve/nodes/openbsd-amd64/lxc/107.conf:net0: name=eth0,bridge=vxnet5,hwaddr=E6:1F:3C:53:38:90,ip=10.1.1.3/24,type=veth
/etc/pve/nodes/pve-openbsd/lxc/105.conf:net0: name=eth0,bridge=vxnet5,hwaddr=26:69:E4:88:6F:3D,ip=10.1.1.1/24,type=veth
/etc/pve/nodes/pve-openbsd/lxc/108.conf:net0: name=eth0,bridge=vxnet5,hwaddr=D2:97:E7:2A:C4:B3,ip=10.1.1.4/24,type=veth

afaict, the vxlan interface on both sides seem configured, although the generated config uses vxlan_remoteip (coming from https://github.com/proxmox/pve-network/blame/master/src/PVE/Network/SDN/Zones/VxlanPlugin.pm#L80) instead of vxlan-remoteip which is documented on https://manpages.debian.org/stretch/ifupdown2/ifupdown-addons-interfaces.5.en.html but looking at ifupdown2 logs it doesnt seem to bother/complain about that, and the code on the pve-network side is this way since forever.

the bridge fdb table seems correctly configured with 00:00:00:00:00:00 entries with the remote ip as dst, which seems to be for BUM traffic as i've understood from reading https://vincent.bernat.ch/en/blog/2017-vxlan-linux#unicast-with-static-flooding

Code:
root@pve-openbsd:~# bridge fdb show dev vxlan_vxnet5
0a:0b:8c:a4:10:77 vlan 1 master vxnet5 permanent
0a:0b:8c:a4:10:77 master vxnet5 permanent
00:00:00:00:00:00 dst 10.0.254.1 self permanent
root@openbsd-amd64:~# bridge fdb show dev vxlan_vxnet5
4e:2f:d3:20:9d:42 vlan 1 master vxnet5 permanent
4e:2f:d3:20:9d:42 master vxnet5 permanent
00:00:00:00:00:00 dst 10.0.254.2 self permanent

all nodes have IPs in the 10.1.1.0/24 subnet - i havent configured a subnet in proxmox sdn because i was unsure if it was required/useful outside of IPAM modules..

if i ping from CT 106 to 107 or to/from CT 105 to 108 (eg CTs on the same node) then ping works fine.

if i try pinging a CT on the other side of the vxlan tunnel, then nothing goes through. tcpdumping on the various interfaces, i see ARP requests being sent:
- from the ping emitter host on the vxnet5, vxlan_vxnet5 and eth1 interfaces
- only on the eth1 interface on the receiving side (eg the remote node hosting the ping target CT) - the ARP request never makes it to the vxnet5/vxlan_vxnet5 interfaces there
- and there's never an ARP reply sent - so there's no ping going through.
Code:
root@openbsd-amd64:~# tcpdump -i eth1 port 4789
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
13:12:46.900622 IP 10.0.254.2.50445 > 10.0.254.1.4789: VXLAN, flags [I] (0x08), vni 5
ARP, Request who-has 10.1.1.2 tell 10.1.1.4, length 28
13:12:47.903180 IP 10.0.254.2.50445 > 10.0.254.1.4789: VXLAN, flags [I] (0x08), vni 5
ARP, Request who-has 10.1.1.2 tell 10.1.1.4, length 28
13:12:48.927082 IP 10.0.254.2.50445 > 10.0.254.1.4789: VXLAN, flags [I] (0x08), vni 5
ARP, Request who-has 10.1.1.2 tell 10.1.1.4, length 28

- i have the default proxmox firewall setup on the cluster, but i dont think it should matter much for the vxlan traffic since i see it on both sides of the eth1 link.

i've looked at the details of the vxlan iface with ip -d, and i've tried various things after looking at ifupdown2 documentation:
- enforcing remoteip via vxlan-remoteip instead of vxlan_remoteip
- enforcing local ip via vxlan-local-tunnelip, eg adding to interfaces.d/sdn
Code:
        vxlan-remoteip 10.0.254.1
        vxlan-local-tunnelip 10.0.254.2
which results in (after ifreload -a of course)
Code:
root@pve-openbsd:~# ip -d a sh dev vxlan_vxnet5
57: vxlan_vxnet5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master vxnet5 state UNKNOWN group default qlen 1000
    link/ether 0a:0b:8c:a4:10:77 brd ff:ff:ff:ff:ff:ff promiscuity 1  allmulti 1 minmtu 68 maxmtu 65535 
    vxlan id 5 local 10.0.254.2 srcport 0 0 dstport 4789 ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx 
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.a6:f6:b2:59:c8:e9 designated_root 8000.a6:f6:b2:59:c8:e9 hold_timer    0.00 message_age_timer    0.00 forward_delay_timer    0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 2 mcast_fast_leave off mcast_flood on bcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off locked off numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536

but nothing seems to change, ie the 'unicast flooding' of the ARP requests doesnt seem to make it where it should. since i dont have a working vxlan setup i cant compare what works/doesnt work.. should i be able to see the CT mac addresses somewhere in the ip neighbour table on the hosts ?

help and hints welcome, it feels like im missing something, i originally just followed the example from https://blog.raspot.in/fr/blog/mise-en-place-du-sdn-sur-promox-7 which seems to say it should just work... ofc, i can provide more details on the setup.
 
ok, as soon as i hit sent, i tried adding rules to the proxmox firewall.. and now it works, pings go through both sides of the vxlan tunnel.

so to have vxlan properly working, i needed this rule in /etc/pve/firewall/cluster.fw

Code:
IN ACCEPT -i eth1 -p udp -dport 4789 -log info # vxlan
**edited** only the incoming rule seems needed
could this be done by default when a vxlan SDN zone is configured ? or i did something wrong and it's supposed to be the case but got forgotten on the way ?
 
Last edited:
ok, as soon as i hit sent, i tried adding rules to the proxmox firewall.. and now it works, pings go through both sides of the vxlan tunnel.

so to have vxlan properly working, i needed this rule in /etc/pve/firewall/cluster.fw

Code:
IN ACCEPT -i eth1 -p udp -dport 4789 -log info # vxlan
**edited** only the incoming rule seems needed
could this be done by default when a vxlan SDN zone is configured ? or i did something wrong and it's supposed to be the case but got forgotten on the way ?
Hi,
done auto, not currently. (Maybe later it'll be possible, as I'm thinkg to manage nat rules through pve-firewall, so opening vxlan port could be done too).

I'll add a note in the documentation about the vxlan port for the firewall.
 
Hello!
Have you do correctly this step?

After this, you need to add the following line to the end of the /etc/network/interfaces configuration file, so that the SDN configuration gets included and activated.
source /etc/network/interfaces.d/*
 
Hello!
Have you do correctly this step?

After this, you need to add the following line to the end of the /etc/network/interfaces configuration file, so that the SDN configuration gets included and activated.
source /etc/network/interfaces.d/*
yes of course that part is done, otherwise the vxlan & bridge interfaces wouldnt had been created..
 
Can you change your rule into the firewall cluster like this

IN ACCEPT -p udp -dport 4789 -log nolog
well i can sure, but i dont see what that would change, in my case i know the vxlan traffic is on eth1, and as far as logging goes i've already demoted it to nolog.. at the beginning i had two rules for OUT/IN but figured out the OUT rule wasnt needed in my case, since i only filter incoming traffic (as is the default with proxmox fw iirc)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!