[SOLVED] Interface vlans not created for containers and VMs after uninstalling ifupdown2

Thomas Hukkelberg

Active Member
Mar 4, 2018
18
6
43
Oslo
Dear Proxmoxers!

Strange problem happened to one of our cluster nodes tonight while we were trying to increase the MTU on the bond+vmbr interfaces so we can use 9000 on containers. The need for jumbo frames comes from running ceph gateway containers with samba as frontend for video production storage. Windows clients are able to get 1GB/s write speed, but only 2-300MB/s read (which is a bit weird and we suspect is that packet fragmentation is part of the reason. Testing the same samba config directly on a ceph node with mtu 9000 gives 7-800MB/s read).

We currently have our network interfaces configured in the following way: All 24 nodes have Mellanox dual 40GbE interfaces that have bond0 (for VM/CT) and bond0.4028 (for ceph) that are assigned to vmbr0 and vmbr1. Integrated GbE eno1 used for corosync/management. Switches are Arista 7050QX-32 with VXLAN/OSPF/MLAG with max MTU 9214. This setup have worked great for a long time, but with MTU 1500.

Problem started after we installed ifupdown2 to apply the new network config right away without rebooting. When applying the config changes (set MTU9000 on bond0, bond0.4028, vmbr0, vmbr1 and on port/slaves) we got an error in the GUI that said it could not apply changes. We tried rebooting and realized that bond0.4028 were gone (but present in /etc/network/interfaces) -- and no matter how we formatted the interfaces file, the interface did not come back. Even when putting back the original interfaces file and rebooting the bond0.4028 did not come back. We then thought that maybe the installation of ifupdown2 made some changes to how the interfaces file was interpreted, and we decided to uninstall and purge the ifupdown2 package and reboot the node. Well, that only made things worse because after reboot no interfaces came up, not even eno1! (So we had to access the node by IPMI and run ifconfig manually to add back the mgmt ip so we could continue configuring...)

After a lot of monkeying around, we were able to apparently get the network config back to original state by reinstalling the original ifupdown package along with ifenslave. Something clearly broke when uninstalling ifupdown2! Now, the network are back and cephfs mounts just fine. However when spinning up containers or VMs on this node we are not able to ping them. When looking at the ifconfig on the node itself when a container starts, we notice that no vlan is created on bond0, and we don't understand why. On other nodes we have a lot of entries like bond0.1022@bond0 ...


The question is why the vlan is not created, and if the removal of ifupdown2 also broke something else that pve relies on when creating the vlans at container/VM startup. Does anyone have any clue whats missing?


Below is our original interfaces config with the new config we hope to get commented out.

Code:
root@hk-proxnode-17:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto eno1
iface eno1 inet static
    address 10.40.24.117/22
    gateway 10.40.24.1

iface eno2 inet manual

iface eno3 inet manual

iface eno4 inet manual

iface enp132s0 inet manual

iface enp132s0d1 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves enp132s0 enp132s0d1
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2
#    mtu 9214

auto bond0.4028
iface bond0.4028 inet manual
#        mtu 9214
#        vlan-id 4028
#        vlan-raw-device bond0

auto vmbr0
iface vmbr0 inet static
    address 10.40.20.117/22
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
#    mtu 9000

auto vmbr1
iface vmbr1 inet static
    address 10.40.28.117/22
    bridge-ports bond0.4028
    bridge-stp off
    bridge-fd 0
#    mtu 9000
 
Hi,

what PVE version do you use?
please send the output of

Code:
pveversion -v
 
about ifupdown2 remove, if you use

"apt-remove ifupdown2"

Code:
The following packages will be REMOVED:
  ifupdown2
The following NEW packages will be installed:
ifenslave ifupdown

ifupdown && ifenslave are correctly re-added.

(maybe because you have used "apt purge", it have totally remove them.

They are the only 2 package removed by ifupdown2, so your config should works with ifupdown1 (and should work with ifupdown2 too).
can't tell why it don't work with ifupdown1, but with ifupdown2, you can do a "ifup -a -d" to have full debug log on interfaces creation.
 
Hi, below is output of version.

Have tried to install ifupdown2 again to no avail. when I spin up a container, the interfaces fwbr163i0, fwln163i0, fwpr163p0 and veth163i0 are created, but vlan on bond0.1000 is not created...

Code:
root@hk-proxnode-17:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-5
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.8-pve1
ceph-fuse: 14.2.8-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-22
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
I ran ifup -a -d and then I see an error which i find a bit strange:

Exception: cmd '/sbin/bridge vlan add vid 125-4094 dev bond0' failed: returned 255 (RTNETLINK answers: No space left on device

I'll attach full debug
 

Attachments

  • debug-ifup.txt
    40.3 KB · Views: 1
I ran ifup -a -d and then I see an error which i find a bit strange:

Exception: cmd '/sbin/bridge vlan add vid 125-4094 dev bond0' failed: returned 255 (RTNETLINK answers: No space left on device

I'll attach full debug
mmmm, some nic don't support more than X vlans. (like mellanox connectx3, it's 128 vlans).
what is your nic model ?

you can also limit with "bridge-vids 125,126,2000-2001,3000-3010".

That's also why bond0.X are no created, because of "no space left on device", no more vlan are available.


You can also try without using vlan-aware bridge.
 
Aha, that explains a lot! We do utilize mellanox connectx3! When I removed bridge-vlan-aware yes everything works as expected and I can also set the MTU to 9000. Thanks for revealing that the X3 cards only support 128 vlans!
 
mmmm, some nic don't support more than X vlans. (like mellanox connectx3, it's 128 vlans).
what is your nic model ?

you can also limit with "bridge-vids 125,126,2000-2001,3000-3010".

That's also why bond0.X are no created, because of "no space left on device", no more vlan are available.


You can also try without using vlan-aware bridge.
You're my hero of the day.

Thanks!!!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!