Hi,
on a up-to-date setup with two nodes, the only difference being the kernel version (because of https://forum.proxmox.com/threads/3...ly-slow-after-kernel-5-13.129909/#post-570343 that i havent found time to bisect)
i doubt that can be relevant to the problem i'm seeing, since the vxlan driver shouldnt have changed much between those versions.
i'm trying to build a vxlan zone between them to 'join' linux containers on the same subnet on both sides. i dont use openvswitch at all, the rest of the setup is plain linux bridges for VMs/CTs.
direct link between nodes is on the 10.0.254.0/30 subnet (ie node openbsd-amd64 has 10.0.254.1, node pve-openbsd has 10.0.254.2) - that's also the link used for cluster trafic, and not the public IP interface.
building the vxlan zone from the web iface generates this config:
containers 106 & 107 are on one node, containers 105 & 108 are on the other node, and all are bridged on vxnet5
afaict, the vxlan interface on both sides seem configured, although the generated config uses
the bridge fdb table seems correctly configured with
all nodes have IPs in the 10.1.1.0/24 subnet - i havent configured a subnet in proxmox sdn because i was unsure if it was required/useful outside of IPAM modules..
if i ping from CT 106 to 107 or to/from CT 105 to 108 (eg CTs on the same node) then ping works fine.
if i try pinging a CT on the other side of the vxlan tunnel, then nothing goes through. tcpdumping on the various interfaces, i see ARP requests being sent:
- from the ping emitter host on the vxnet5, vxlan_vxnet5 and eth1 interfaces
- only on the eth1 interface on the receiving side (eg the remote node hosting the ping target CT) - the ARP request never makes it to the vxnet5/vxlan_vxnet5 interfaces there
- and there's never an ARP reply sent - so there's no ping going through.
- i have the default proxmox firewall setup on the cluster, but i dont think it should matter much for the vxlan traffic since i see it on both sides of the eth1 link.
i've looked at the details of the vxlan iface with ip -d, and i've tried various things after looking at ifupdown2 documentation:
- enforcing remoteip via vxlan-remoteip instead of vxlan_remoteip
- enforcing local ip via vxlan-local-tunnelip, eg adding to interfaces.d/sdn
which results in (after ifreload -a of course)
but nothing seems to change, ie the 'unicast flooding' of the ARP requests doesnt seem to make it where it should. since i dont have a working vxlan setup i cant compare what works/doesnt work.. should i be able to see the CT mac addresses somewhere in the ip neighbour table on the hosts ?
help and hints welcome, it feels like im missing something, i originally just followed the example from https://blog.raspot.in/fr/blog/mise-en-place-du-sdn-sur-promox-7 which seems to say it should just work... ofc, i can provide more details on the setup.
on a up-to-date setup with two nodes, the only difference being the kernel version (because of https://forum.proxmox.com/threads/3...ly-slow-after-kernel-5-13.129909/#post-570343 that i havent found time to bisect)
Code:
pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-3-pve)
pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 5.13.19-6-pve)
i'm trying to build a vxlan zone between them to 'join' linux containers on the same subnet on both sides. i dont use openvswitch at all, the rest of the setup is plain linux bridges for VMs/CTs.
direct link between nodes is on the 10.0.254.0/30 subnet (ie node openbsd-amd64 has 10.0.254.1, node pve-openbsd has 10.0.254.2) - that's also the link used for cluster trafic, and not the public IP interface.
Code:
root@pve-openbsd:~# ip a sh dev eth1 |grep brd
link/ether 00:30:48:cd:be:11 brd ff:ff:ff:ff:ff:ff
inet 10.0.254.2/30 brd 10.0.254.3 scope global eth1
root@openbsd-amd64:~# ip a sh dev eth1 |grep brd
link/ether 00:30:48:cd:c2:85 brd ff:ff:ff:ff:ff:ff
inet 10.0.254.1/30 brd 10.0.254.3 scope global eth1
Code:
root@pve-openbsd:~# cat /etc/network/interfaces.d/sdn
#version:16
auto vxlan_vxnet5
iface vxlan_vxnet5
vxlan-id 5
vxlan_remoteip 10.0.254.1
mtu 1450
auto vxnet5
iface vxnet5
bridge_ports vxlan_vxnet5
bridge_stp off
bridge_fd 0
mtu 1450
root@openbsd-amd64:~# cat /etc/network/interfaces.d/sdn
#version:16
auto vxlan_vxnet5
iface vxlan_vxnet5
vxlan-id 5
vxlan_remoteip 10.0.254.2
mtu 1450
auto vxnet5
iface vxnet5
bridge_ports vxlan_vxnet5
bridge_stp off
bridge_fd 0
mtu 1450
containers 106 & 107 are on one node, containers 105 & 108 are on the other node, and all are bridged on vxnet5
Code:
root@pve-openbsd:~# grep net0 /etc/pve/nodes/*/lxc/*
/etc/pve/nodes/openbsd-amd64/lxc/106.conf:net0: name=eth0,bridge=vxnet5,hwaddr=DE:48:1D:B3:25:DA,ip=10.1.1.2/24,type=veth
/etc/pve/nodes/openbsd-amd64/lxc/107.conf:net0: name=eth0,bridge=vxnet5,hwaddr=E6:1F:3C:53:38:90,ip=10.1.1.3/24,type=veth
/etc/pve/nodes/pve-openbsd/lxc/105.conf:net0: name=eth0,bridge=vxnet5,hwaddr=26:69:E4:88:6F:3D,ip=10.1.1.1/24,type=veth
/etc/pve/nodes/pve-openbsd/lxc/108.conf:net0: name=eth0,bridge=vxnet5,hwaddr=D2:97:E7:2A:C4:B3,ip=10.1.1.4/24,type=veth
afaict, the vxlan interface on both sides seem configured, although the generated config uses
vxlan_remoteip
(coming from https://github.com/proxmox/pve-network/blame/master/src/PVE/Network/SDN/Zones/VxlanPlugin.pm#L80) instead of vxlan-remoteip
which is documented on https://manpages.debian.org/stretch/ifupdown2/ifupdown-addons-interfaces.5.en.html but looking at ifupdown2 logs it doesnt seem to bother/complain about that, and the code on the pve-network side is this way since forever.the bridge fdb table seems correctly configured with
00:00:00:00:00:00
entries with the remote ip as dst, which seems to be for BUM traffic as i've understood from reading https://vincent.bernat.ch/en/blog/2017-vxlan-linux#unicast-with-static-flooding
Code:
root@pve-openbsd:~# bridge fdb show dev vxlan_vxnet5
0a:0b:8c:a4:10:77 vlan 1 master vxnet5 permanent
0a:0b:8c:a4:10:77 master vxnet5 permanent
00:00:00:00:00:00 dst 10.0.254.1 self permanent
root@openbsd-amd64:~# bridge fdb show dev vxlan_vxnet5
4e:2f:d3:20:9d:42 vlan 1 master vxnet5 permanent
4e:2f:d3:20:9d:42 master vxnet5 permanent
00:00:00:00:00:00 dst 10.0.254.2 self permanent
all nodes have IPs in the 10.1.1.0/24 subnet - i havent configured a subnet in proxmox sdn because i was unsure if it was required/useful outside of IPAM modules..
if i ping from CT 106 to 107 or to/from CT 105 to 108 (eg CTs on the same node) then ping works fine.
if i try pinging a CT on the other side of the vxlan tunnel, then nothing goes through. tcpdumping on the various interfaces, i see ARP requests being sent:
- from the ping emitter host on the vxnet5, vxlan_vxnet5 and eth1 interfaces
- only on the eth1 interface on the receiving side (eg the remote node hosting the ping target CT) - the ARP request never makes it to the vxnet5/vxlan_vxnet5 interfaces there
- and there's never an ARP reply sent - so there's no ping going through.
Code:
root@openbsd-amd64:~# tcpdump -i eth1 port 4789
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
13:12:46.900622 IP 10.0.254.2.50445 > 10.0.254.1.4789: VXLAN, flags [I] (0x08), vni 5
ARP, Request who-has 10.1.1.2 tell 10.1.1.4, length 28
13:12:47.903180 IP 10.0.254.2.50445 > 10.0.254.1.4789: VXLAN, flags [I] (0x08), vni 5
ARP, Request who-has 10.1.1.2 tell 10.1.1.4, length 28
13:12:48.927082 IP 10.0.254.2.50445 > 10.0.254.1.4789: VXLAN, flags [I] (0x08), vni 5
ARP, Request who-has 10.1.1.2 tell 10.1.1.4, length 28
- i have the default proxmox firewall setup on the cluster, but i dont think it should matter much for the vxlan traffic since i see it on both sides of the eth1 link.
i've looked at the details of the vxlan iface with ip -d, and i've tried various things after looking at ifupdown2 documentation:
- enforcing remoteip via vxlan-remoteip instead of vxlan_remoteip
- enforcing local ip via vxlan-local-tunnelip, eg adding to interfaces.d/sdn
Code:
vxlan-remoteip 10.0.254.1
vxlan-local-tunnelip 10.0.254.2
Code:
root@pve-openbsd:~# ip -d a sh dev vxlan_vxnet5
57: vxlan_vxnet5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master vxnet5 state UNKNOWN group default qlen 1000
link/ether 0a:0b:8c:a4:10:77 brd ff:ff:ff:ff:ff:ff promiscuity 1 allmulti 1 minmtu 68 maxmtu 65535
vxlan id 5 local 10.0.254.2 srcport 0 0 dstport 4789 ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx
bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.a6:f6:b2:59:c8:e9 designated_root 8000.a6:f6:b2:59:c8:e9 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 2 mcast_fast_leave off mcast_flood on bcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off locked off numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536
but nothing seems to change, ie the 'unicast flooding' of the ARP requests doesnt seem to make it where it should. since i dont have a working vxlan setup i cant compare what works/doesnt work.. should i be able to see the CT mac addresses somewhere in the ip neighbour table on the hosts ?
help and hints welcome, it feels like im missing something, i originally just followed the example from https://blog.raspot.in/fr/blog/mise-en-place-du-sdn-sur-promox-7 which seems to say it should just work... ofc, i can provide more details on the setup.