[SOLVED] SDN - weird issues

CodeBreaker

Member
Aug 6, 2019
17
2
23
31
I'm stumped.

I'm migrating my setup to SDN with VLAN zone over vmbr4 with bridge port bond0.2920 VLAN. I didn't touch network config for VMs and CTs except changing interface. At first glanced everything seemed to work. Pinging working correctly, internet working fine etc. After a while I've noticed that I can't access some of my apps through proxy. It spins until timeout. Looking into more detail the initial html gets loaded but the subsequent requests to script and styles are stuck pending. After more in depth testing it seems that on some nodes it does not work and on some it does. But after some migrating around and restart it stopped working one some that it previously worked and started where it didn't. Which was weird.

You might think I have duplicate IPs on the network but I've checked and shut down everything except 1 CT and 2 VMs and they have different IPs. Since ping works always regardless of on what nodes the VMs and CT are, I tried connecting with SSH. I've tried all the combinations of location of the VMs and CT and the results are:

ssh VM1 -> CT = sometimes works
ssh VM2 -> CT = sometimes works
ssh CT -> VM1 = sometimes works
ssh CT -> VM2 = sometimes works
ssh VM1 -> VM2 = sometimes works
ssh VM2 -> VM1 = sometimes works

Go figure. I've repeated the testes with migrations and most of the time everything works when the VMs/Ct are on nodes 1,2,3 but most of the time it does not work when on nodes 4,5,6. When the VMs/CT are on the same node everything works fine and since everything worked before so I'm thinking it must be SDN.

But it is a weird issue that ping works but SSH does not. Below is the result when trying to connect with ssh. The connection stops in the middle of connection.

Code:
OpenSSH_8.4p1 Debian-5, OpenSSL 1.1.1n  15 Mar 2022
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug1: Connecting to 10.27.30.20 [10.27.30.20] port 22.
debug1: Connection established.
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa_sk type -1
debug1: identity file /root/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: identity file /root/.ssh/id_ed25519_sk type -1
debug1: identity file /root/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /root/.ssh/id_xmss type -1
debug1: identity file /root/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.4p1 Debian-5
debug1: Remote protocol version 2.0, remote software version OpenSSH_8.9p1 Ubuntu-3ubuntu0.3
debug1: match: OpenSSH_8.9p1 Ubuntu-3ubuntu0.3 pat OpenSSH* compat 0x04000000
debug1: Authenticating to 10.27.30.20:22 as 'root'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

I've checked all the network configs and they look fine, and since I can ping everything Im guessing they are fine. So what could be the problem? What can i do to better troubleshoot this?

Nodes 1,2,3 are Dell R320 and nodes 4,5,6 are R620. Below are network configs for Node 1. Configs are the same and the only difference that I can see is the IP on the VLANs and bridge. I can post configs for all nodes if required.

cat /etc/network/interfaces
Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

auto enp10s0f0np0
iface enp10s0f0np0 inet manual

auto enp10s0f1np1
iface enp10s0f1np1 inet manual

auto eno1.2911
iface eno1.2911 inet manual
#Proxmox Cluster

auto bond0
iface bond0 inet manual
        bond-slaves enp10s0f0np0 enp10s0f1np1
        bond-miimon 100
        bond-mode 802.3ad
#10G-BOND

auto bond0.2912
iface bond0.2912 inet manual
#CEPH Public

auto bond0.2913
iface bond0.2913 inet static
        address 172.29.13.101/24
#CEPH Cluster

auto bond0.2914
iface bond0.2914 inet static
        address 172.29.14.101/24
#PROXMOX_MIGRATION

auto bond0.2916
iface bond0.2916 inet manual
#Proxmox VM

auto bond0.2920
iface bond0.2920 inet manual
#sdn

auto bond0.100
iface bond0.100 inet manual
#isp

auto bond0.2917
iface bond0.2917 inet manual
#lan

auto vmbr0
iface vmbr0 inet static
        address 172.29.11.101/24
        gateway 172.29.11.1
        bridge-ports eno1.2911
        bridge-stp off
        bridge-fd 0
#Proxmox Cluster

auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#LEGACY_VM

auto vmbr2
iface vmbr2 inet static
        address 172.29.12.101/24
        bridge-ports bond0.2912
        bridge-stp off
        bridge-fd 0
#CEPH Public

auto vmbr3
iface vmbr3 inet manual
        bridge-ports bond0.2916
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#Proxmox VM

auto vmbr4
iface vmbr4 inet manual
        bridge-ports bond0.2920
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#sdn

auto vmbr5
iface vmbr5 inet manual
        bridge-ports bond0.100
        bridge-stp off
        bridge-fd 0
#isp

auto vmbr6
iface vmbr6 inet manual
        bridge-ports bond0.2917
        bridge-stp off
        bridge-fd 0
#lan

source /etc/network/interfaces.d/*

cat /etc/pve/sdn/zones.cfg
Code:
vlan: vlanx
        bridge vmbr4
        ipam pve
        nodes pve03,pve06,pve07,pve01,pve02,pve05

cat /etc/pve/sdn/vnets.cfg
Code:
vnet: home
        zone vlanx
        alias Home network
        tag 2730

cat /etc/network/interfaces.d/sdn
Code:
#version:30

auto home
iface home
        bridge_ports vmbr4.2730
        bridge_stp off
        bridge_fd 0
        alias Home network
 
It was the MTU... I had everything on auto and it didn't occur to me that having SDN over VLAN could be the issue. Reduced the MTU on the VMs from default 1500 to 1496 and it works like a charm.

Now I need to figure out how to reduce MTU for all VM at once...
 
Last edited:
  • Like
Reactions: StreetPiet
Additional information for those that might need it.

Because I'm running SDN traffic on its own additional linux vlan (defined in proxmox) an additional 4 bytes is added to the ethernet frame. This exceeds the max MTU defined on the switch. Basically it's the same as running QinQ Zones in Proxmox SDN (they even state the need to reduce the MTU on the VMs to 1496).

I have to experiment with increasing the MTU on the switch so that I can remove manual changing of the MTU on all VMs (containers automatically get MTU from bridge while on VM i have to use special MTU case of 1).
 
  • Like
Reactions: dozono

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!