Loss of connectivity - OvS (apt-get -y dist-upgrade)

Jun 8, 2016
344
76
93
49
Johannesburg, South Africa
PVE 7.3 results in loss of connectivity when OvS (Open vScwitch) is upgraded.

OvS where 2 x 10G interfaces are bonded, vlan 1 is untagged for the node itself and vlan 100 is for Ceph and cluster communication:
Code:
[root@kvm1a ~]# cat /etc/network/interfaces
auto lo
iface lo inet loopback

auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports bond0 vlan1 vlan100
        ovs_mtu 9216

auto ether0
allow-vmbr0 ether0
iface ether0 inet manual

auto ether1
allow-vmbr0 ether1
iface ether1 inet manual

auto bond0
allow-vmbr0 bond0
iface bond0 inet manual
        ovs_bridge vmbr0
        ovs_type OVSBond
        ovs_bonds ether0 ether1
        ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast tag=1 vlan_mode=native-untagged
        ovs_mtu 9216

auto vlan1
allow-vmbr0 vlan1
iface vlan1 inet dhcp
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=1
        ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
        hwaddress 2e:63:54:9b:86:65
        ovs_mtu 1500

auto vlan100
allow-vmbr0 vlan100
iface vlan100 inet static
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=100
        ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
        hwaddress 86:b9:01:d7:1e:30
        address 10.248.254.10/24
        ovs_mtu 9216

apt-get -y dist-upgrade:
Code:
[admin@kvm1a ~]# apt-get update; apt-get -y dist-upgrade; apt-get autoremove; apt-get autoclean;
Hit:1 http://ftp.debian.org/debian bullseye InRelease
Get:2 http://ftp.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:3 http://security.debian.org bullseye-security InRelease [48.4 kB]
Hit:4 http://download.proxmox.com/debian/ceph-pacific bullseye InRelease
Hit:5 https://enterprise.proxmox.com/debian/pve bullseye InRelease
Fetched 92.4 kB in 2s (42.3 kB/s)
<snip>
The following NEW packages will be installed:
  pve-kernel-5.15.83-1-pve
The following packages will be upgraded:
  bind9-dnsutils bind9-host bind9-libs curl dnsutils libcurl3-gnutls libcurl4 libnvpair3linux libuutil3linux libzfs4linux libzpool5linux linux-libc-dev
  openvswitch-common openvswitch-switch pve-firmware pve-kernel-5.15 pve-kernel-helper qemu-server spl sudo zfs-initramfs zfs-zed zfsutils-linux
23 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 180 MB of archives.
<snip>
Preparing to unpack .../12-openvswitch-common_2.15.0+ds1-2+deb11u2_amd64.deb ...
Unpacking openvswitch-common (2.15.0+ds1-2+deb11u2) over (2.15.0+ds1-2+deb11u1) ...
Preparing to unpack .../13-openvswitch-switch_2.15.0+ds1-2+deb11u2_amd64.deb ...
Unpacking openvswitch-switch (2.15.0+ds1-2+deb11u2) over (2.15.0+ds1-2+deb11u1) ...
<snip>
Setting up openvswitch-common (2.15.0+ds1-2+deb11u2) ...
Setting up pve-kernel-5.15.83-1-pve (5.15.83-1) ...
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 5.15.83-1-pve /boot/vmlinuz-5.15.83-1-pve
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 5.15.83-1-pve /boot/vmlinuz-5.15.83-1-pve
update-initramfs: Generating /boot/initrd.img-5.15.83-1-pve
<snip>
Setting up libcurl4:amd64 (7.74.0-1.3+deb11u5) ...
Setting up curl (7.74.0-1.3+deb11u5) ...
Setting up libuutil3linux (2.1.7-pve3) ...
Setting up bind9-host (1:9.16.37-1~deb11u1) ...
Setting up openvswitch-switch (2.15.0+ds1-2+deb11u2) ...
ovs-vswitchd.service is a disabled or a static unit not running, not starting it.



pve-version -v
:
Code:
[admin@kvm1b ~]# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.39-1-pve: 5.15.39-1
ceph: 16.2.9-pve1
ceph-fuse: 16.2.9-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u1
proxmox-backup-client: 2.3.1-1
proxmox-backup-file-restore: 2.3.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1


Systems work perfeclty when they next boot up, super annoying as this bug has been with us for 2+ years now...
 
We managed to reproduce the issue (with version 2.15.0+ds1-2+deb11u2) and are currently considering to upload a version with a fix into our repository (the fix is scheduled to be included in the next point-release of debian bullseye [0])

[0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030113
 
openvswitch-switch (and the other packages from the openvswitch source) version 2.15.0+ds1-2+deb11u2.1 is now available in the pvetest repository and contains the fix described in the debian bug-report

Feedback if the upgrade works smoothly for you would be much appreciated!

for the pvetest repository see: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_test_repo
repository managment is also available in the GUI - https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#_repositories_in_proxmox_ve