Dear Proxmoxers!
Strange problem happened to one of our cluster nodes tonight while we were trying to increase the MTU on the bond+vmbr interfaces so we can use 9000 on containers. The need for jumbo frames comes from running ceph gateway containers with samba as frontend for video production storage. Windows clients are able to get 1GB/s write speed, but only 2-300MB/s read (which is a bit weird and we suspect is that packet fragmentation is part of the reason. Testing the same samba config directly on a ceph node with mtu 9000 gives 7-800MB/s read).
We currently have our network interfaces configured in the following way: All 24 nodes have Mellanox dual 40GbE interfaces that have bond0 (for VM/CT) and bond0.4028 (for ceph) that are assigned to vmbr0 and vmbr1. Integrated GbE eno1 used for corosync/management. Switches are Arista 7050QX-32 with VXLAN/OSPF/MLAG with max MTU 9214. This setup have worked great for a long time, but with MTU 1500.
Problem started after we installed
After a lot of monkeying around, we were able to apparently get the network config back to original state by reinstalling the original
The question is why the vlan is not created, and if the removal of
Below is our original interfaces config with the new config we hope to get commented out.
Strange problem happened to one of our cluster nodes tonight while we were trying to increase the MTU on the bond+vmbr interfaces so we can use 9000 on containers. The need for jumbo frames comes from running ceph gateway containers with samba as frontend for video production storage. Windows clients are able to get 1GB/s write speed, but only 2-300MB/s read (which is a bit weird and we suspect is that packet fragmentation is part of the reason. Testing the same samba config directly on a ceph node with mtu 9000 gives 7-800MB/s read).
We currently have our network interfaces configured in the following way: All 24 nodes have Mellanox dual 40GbE interfaces that have bond0 (for VM/CT) and bond0.4028 (for ceph) that are assigned to vmbr0 and vmbr1. Integrated GbE eno1 used for corosync/management. Switches are Arista 7050QX-32 with VXLAN/OSPF/MLAG with max MTU 9214. This setup have worked great for a long time, but with MTU 1500.
Problem started after we installed
ifupdown2
to apply the new network config right away without rebooting. When applying the config changes (set MTU9000 on bond0, bond0.4028, vmbr0, vmbr1 and on port/slaves) we got an error in the GUI that said it could not apply changes. We tried rebooting and realized that bond0.4028 were gone (but present in /etc/network/interfaces) -- and no matter how we formatted the interfaces file, the interface did not come back. Even when putting back the original interfaces file and rebooting the bond0.4028 did not come back. We then thought that maybe the installation of ifupdown2
made some changes to how the interfaces file was interpreted, and we decided to uninstall and purge the ifupdown2
package and reboot the node. Well, that only made things worse because after reboot no interfaces came up, not even eno1! (So we had to access the node by IPMI and run ifconfig manually to add back the mgmt ip so we could continue configuring...)After a lot of monkeying around, we were able to apparently get the network config back to original state by reinstalling the original
ifupdown
package along with ifenslave
. Something clearly broke when uninstalling ifupdown2
! Now, the network are back and cephfs mounts just fine. However when spinning up containers or VMs on this node we are not able to ping them. When looking at the ifconfig
on the node itself when a container starts, we notice that no vlan is created on bond0, and we don't understand why. On other nodes we have a lot of entries like bond0.1022@bond0
...The question is why the vlan is not created, and if the removal of
ifupdown2
also broke something else that pve relies on when creating the vlans at container/VM startup. Does anyone have any clue whats missing?Below is our original interfaces config with the new config we hope to get commented out.
Code:
root@hk-proxnode-17:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet static
address 10.40.24.117/22
gateway 10.40.24.1
iface eno2 inet manual
iface eno3 inet manual
iface eno4 inet manual
iface enp132s0 inet manual
iface enp132s0d1 inet manual
auto bond0
iface bond0 inet manual
bond-slaves enp132s0 enp132s0d1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2
# mtu 9214
auto bond0.4028
iface bond0.4028 inet manual
# mtu 9214
# vlan-id 4028
# vlan-raw-device bond0
auto vmbr0
iface vmbr0 inet static
address 10.40.20.117/22
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
# mtu 9000
auto vmbr1
iface vmbr1 inet static
address 10.40.28.117/22
bridge-ports bond0.4028
bridge-stp off
bridge-fd 0
# mtu 9000