VXLAN SDN Network with MTU <1280 means Containers and VMs cannot start

Apr 24, 2020
26
6
8
30
I have deployed a VXLAN setup on my homelab cluster and I can get connectivity between containers on various hosts, as long as the MTU on the VXLAN zone is greater than or equal to 1280 (the minimum size of MTU in IPv6). My intended final state is one where the VXLAN networking is encapsulated over fireguard tunnels between the hosts. I'm using Tailscale to manage the tunnels, which runs MTU of 1280 on the tailscale0 interface.

When I run the VXLAN zone with MTU 1280 and force the container's interface spec to 1230, then connectivity works fine, but ideally I'd be able to specify a lower MTU on the VXLAN zone.

Some other approaches I have considered:
  1. Increasing the MTU on the tailscale interface to 1330 (50 more than 1280). This is feasible, but not recommended by Tailscale because sometime wireguard requires additional headroom to renegotiate the connection. This is also non-standard for other tailnet clients.
  2. do MSS clamping (recommended by Tailscale as a last -resort, but it's a bit of a dirty hack). I would prefer not to have to do this, especially when I'm so close to having a functional VXLAN setup with MTU=1230 for all my containers and virtual machines.
  3. Disable IPv6 on the nodes and see what happens. I have considered disabling IPv6 on the nodes, but that's a pretty drastic shot in the dark.
@spirit your SDN feature is amazing, do you have any thoughts on this? It seems related to IPv6 somehow, but I haven't been able to determine how.


Output from a VM attached to a VXLAN interface with mtu = 1279:
Code:
failed to open /proc/sys/net/ipv6/conf/tap200102i0/disable_ipv6 for writing: No such file or directory
kvm: -netdev type=tap,id=net0,ifname=tap200102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on: network script /var/lib/qemu-server/pve-bridge failed with status 512
TASK ERROR: start failed: QEMU exited with code 1

Output from an LXC container attached to a VXLAN interface with mtu=1279:
Code:
run_buffer: 321 Script exited with status 2
lxc_create_network_priv: 3413 Success - Failed to create network device
lxc_spawn: 1837 Failed to create the network

TASK ERROR: startup for container '100203' failed


syslog from system unable to start VM:
Code:
Apr 03 16:05:23 S1-C003-PM003 pvedaemon[2505703]: start VM 200102: UPID:S1-C003-PM003:00263BE7:00F67D24:6249FE03:qmstart:200102:root@pam:
Apr 03 16:05:23 S1-C003-PM003 pvedaemon[2493415]: <root@pam> starting task UPID:S1-C003-PM003:00263BE7:00F67D24:6249FE03:qmstart:200102:root@pam:
Apr 03 16:05:23 S1-C003-PM003 systemd[1]: Started 200102.scope.
Apr 03 16:05:23 S1-C003-PM003 systemd-udevd[2505718]: Using default interface naming scheme 'v247'.
Apr 03 16:05:23 S1-C003-PM003 systemd-udevd[2505718]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 03 16:05:24 S1-C003-PM003 kernel: device tap200102i0 entered promiscuous mode
Apr 03 16:05:24 S1-C003-PM003 ovs-vsctl[2505752]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap200102i0
Apr 03 16:05:24 S1-C003-PM003 ovs-vsctl[2505752]: ovs|00002|db_ctl_base|ERR|no port named tap200102i0
Apr 03 16:05:24 S1-C003-PM003 ovs-vsctl[2505753]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln200102i0
Apr 03 16:05:24 S1-C003-PM003 ovs-vsctl[2505753]: ovs|00002|db_ctl_base|ERR|no port named fwln200102i0
Apr 03 16:05:24 S1-C003-PM003 bgpd[1217]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF tap200102i0 in VRF 0
Apr 03 16:05:24 S1-C003-PM003 pvedaemon[2459354]: VM 200102 qmp command failed - VM 200102 not running
Apr 03 16:05:24 S1-C003-PM003 systemd[1]: 200102.scope: Succeeded.
Apr 03 16:05:24 S1-C003-PM003 pvedaemon[2505703]: start failed: QEMU exited with code 1
 
Hi, i m currently on holiday for 2 week, i l' try to test it when i come back. About mtu, you Can use same mtu in your CT than in the zone config. (In the future, it ll be auto derivated). This mtu just need to be 50 bytes lower than your outgoing tunnel interface. I dont have tested wireguard yet, but with a simple ipsec tunnel, it was working fine.
 
Just use Zerotier; it runs a fully meshed VXLAN style overlay network with a 2800 MTU. It intelligently fragments and reassembles packets over any WAN link. You can self host the controller and even self host your own root servers. It supports WAN bonding as well as other advanced load balancing schemes. Zerotier really is the best open source SD-WAN / SDN product on the market. Legit on par with commercial solutions like Velocloud and Viptela.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!